Since 2012, the Centers for Medicare and Medicaid Services have implemented the Hospital Readmission Reduction Program (HRRP). This program tracks hospital readmission rates and incentivizes hospitals to reduce unnecessary readmissions through financial penalties. Using the 2019-2022 readmission data from the HRRP, this analysis aims to identify the preferred and non-preferred hospitals for hip and knee replacements for a health insurance company. Furthermore, it will examine the risk factors associated with higher readmission rates for these procedures.
What risk factors are associated with hospital readmission rates for hip/knee replacements?
Understanding these risk factors can help health insurance companies guide patients towards hospitals with better outcomes, thereby improving patient outcomes and reducing costs associated with readmissions.
The insights from this analysis can be used to improve hospital performance, enhance patient care, and reduce costs. As of 2019, the average cost of readmission after hip/knee surgery was $8,588, and avoiding that cost would be highly beneficial for health insurance companies and consumers alike (Phillips et al., 2019).
Previous analyses have used these same or similar datasets with Logistic Regression and Random Forest models to identify the most important risk factors as they pertain to hospital readmission rates for hip/knee replacements. We will be trying to improve on this type of analysis by improving the performance of the models using various techniques. Prior analyses have implemented Random Forest models to extract important risk factors, but no prior analyses have used Random Forest to classify hospitals as preferred or non-preferred for hip/knee replacement, based on the important risk factors.
Hospitals with better Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) scores will have lower readmission rates for hip/knee replacements because higher patient satisfaction often correlates with better overall care quality and patient outcomes, including reduced complications and better post-discharge support (Edwards et al., 2015).
We will be using the datasets from the Centers for Medicare and Medicaid Services (Centers for Medicare & Medicaid Services, 2024). Our target variable will be the readmission rate after hip/knee surgery, using data from 2019-2022. We will utilize predictors from the HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems) dataset as well as Timely and Effective Care, containing information on average wait times and vaccination compliance, Complications and Deaths, containing information about the frequency of deaths and complications for procedures, and Payment and Spending metrics, which includes the costs associated with procedures.
We will consider our analysis successful if we can identify clear risk factors associated with hospital readmission rates and accurately classify hospitals as preferred or non-preferred.
Our hypothesis will be supported if hospitals with better HCAHPS scores demonstrate statistically significantly lower readmission rates for hip/knee replacements.
A potential pitfall of our analysis plan is data quality and completeness. The dataset does contain missing values, and it will need to be preprocessed to handle these missing values, outliers, and inconsistencies. Another potential pitfall is not having adequate computing power to implement deep learning with the size of our dataset. Lastly, a pitfall that we need to keep an eye out for is overfitting. We will know we have overfitting if the train set far outperforms the test set, in terms of model accuracy.
# Set the directory for the data files
filepath <- "/Users/adelinecasali/Desktop/hospitals_current_data/"
# List the files in the directory that have "Hospital.csv"
files <- list.files(path = filepath, pattern = "Hospital.csv")
# Iterate through each file in the list
for(f in 1:length(files)) {
# Read the CSV, clean column names to upper camel case, and store in "dat"
dat <- clean_names(read_csv(paste0(filepath, files[f]),
show_col_types = FALSE),
case = "upper_camel")
# Remove ".Hospital.csv" part of the file names to create variable name
filename <- gsub(".Hospital\\.csv", "", files[f])
# Assign data to a variable with the above created name
assign(filename, dat)
}
# Create a df of file names without ".Hospital.csv"
files <- gsub(".Hospital\\.csv", "", files) %>% data.frame()
# Set column name of the df to "File Name"
names(files) <- "File Name"
files %>%
kable(
format = "html",
caption = "Table 1. List of hospital-level data files.") %>%
kable_styling(bootstrap_options = c("striped", full_width = F)
)
| File Name |
|---|
| Complications_and_Deaths |
| FY_2024_HAC_Reduction_Program |
| FY_2024_Hospital_Readmissions_Reduction_Program |
| HCAHPS |
| Healthcare_Associated_Infections |
| Maternal_Health |
| Medicare_Hospital_Spending_Per_Patient |
| Outpatient_Imaging_Efficiency |
| Payment_and_Value_of_Care |
| Timely_and_Effective_Care |
| Unplanned_Hospital_Visits |
# Display first 10 rows of FY_2024_Hospital_Readmissions_Reduction_Program
head(FY_2024_Hospital_Readmissions_Reduction_Program,10)
## # A tibble: 10 × 12
## FacilityName FacilityId State MeasureName NumberOfDischarges Footnote
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 SOUTHEAST HEALTH ME… 010001 AL READM-30-H… N/A NA
## 2 SOUTHEAST HEALTH ME… 010001 AL READM-30-H… 616 NA
## 3 SOUTHEAST HEALTH ME… 010001 AL READM-30-A… 274 NA
## 4 SOUTHEAST HEALTH ME… 010001 AL READM-30-P… 404 NA
## 5 SOUTHEAST HEALTH ME… 010001 AL READM-30-C… 126 NA
## 6 SOUTHEAST HEALTH ME… 010001 AL READM-30-C… 117 NA
## 7 MARSHALL MEDICAL CE… 010005 AL READM-30-A… N/A 1
## 8 MARSHALL MEDICAL CE… 010005 AL READM-30-C… 137 NA
## 9 MARSHALL MEDICAL CE… 010005 AL READM-30-P… 285 NA
## 10 MARSHALL MEDICAL CE… 010005 AL READM-30-H… 129 NA
## # ℹ 6 more variables: ExcessReadmissionRatio <chr>,
## # PredictedReadmissionRate <chr>, ExpectedReadmissionRate <chr>,
## # NumberOfReadmissions <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## Footnote
## 12077
# Use the function "replace_with_na_all()" to replace aberrant values with NA
FY_2024_Hospital_Readmissions_Reduction_Program <- replace_with_na_all(FY_2024_Hospital_Readmissions_Reduction_Program, condition = ~ .x == "N/A")
# Replace "Too Few to Report" values with "5" in using gsub
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- gsub("Too Few to Report", "5", FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)
# Check first 10 rows to confirm that it worked
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions, 10)
## [1] "5" "149" "32" "68" "11" "20" NA "14" "40" "24"
# NumberOfReadmissions had to be converted to numeric before applying integers
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- as.numeric(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)
# Find all values of "5" in NumberOfReadmissions
fives <- which(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions == 5)
# Replace values of "5" with random integers from 1 - 10
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions[fives] <- sample(1:10, length(fives), replace = TRUE)
# Check the first 20 rows to see if this was applied correctly
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions,20)
## [1] 3 149 32 68 11 20 NA 14 40 24 3 NA 10 21 15 83 36 75 2
## [20] NA
# Selecting the columns to convert
columns_to_convert <- c("NumberOfDischarges", "ExcessReadmissionRatio", "PredictedReadmissionRate", "ExpectedReadmissionRate", "NumberOfReadmissions")
# Use mutate_at to convert the specified columns to numeric
FY_2024_Hospital_Readmissions_Reduction_Program <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
mutate_at(vars(one_of(columns_to_convert)), as.numeric)
# Print the structure of the dataframe to check the changes
str(FY_2024_Hospital_Readmissions_Reduction_Program)
## tibble [18,774 × 12] (S3: tbl_df/tbl/data.frame)
## $ FacilityName : chr [1:18774] "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" ...
## $ FacilityId : chr [1:18774] "010001" "010001" "010001" "010001" ...
## $ State : chr [1:18774] "AL" "AL" "AL" "AL" ...
## $ MeasureName : chr [1:18774] "READM-30-HIP-KNEE-HRRP" "READM-30-HF-HRRP" "READM-30-AMI-HRRP" "READM-30-PN-HRRP" ...
## $ NumberOfDischarges : num [1:18774] NA 616 274 404 126 117 NA 137 285 129 ...
## $ Footnote : num [1:18774] NA NA NA NA NA NA 1 NA NA NA ...
## $ ExcessReadmissionRatio : num [1:18774] 0.892 1.1 0.933 0.987 0.952 ...
## $ PredictedReadmissionRate: num [1:18774] 3.53 23.13 12.9 17.05 9.81 ...
## $ ExpectedReadmissionRate : num [1:18774] 3.96 21.02 13.83 17.28 10.31 ...
## $ NumberOfReadmissions : num [1:18774] 3 149 32 68 11 20 NA 14 40 24 ...
## $ StartDate : chr [1:18774] "07/01/2019" "07/01/2019" "07/01/2019" "07/01/2019" ...
## $ EndDate : chr [1:18774] "06/30/2022" "06/30/2022" "06/30/2022" "06/30/2022" ...
FY_2024_Hospital_Readmissions_Reduction_Program <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
mutate(MeasureName = gsub("READM-30-", "", MeasureName)) %>%
mutate(MeasureName = gsub("-HRRP", "", MeasureName))
dict <- tribble(
~Acronym, ~Definition,
"HIP-KNEE", "Total Hip/Knee Arthroplasty",
"HF", "Heart Failure",
"COPD", "Chronic Obstructive Pulmonary Disease",
"AMI", "Acute Myocardial Infarction",
"CABG", "Coronary Artery Bypass Graft",
"PN", "Pneumonia"
)
dict %>%
kable(
format = "html",
caption = "Table 2. Acronyms of medical conditions for which hospital readmissions are tracked.") %>%
kable_styling(bootstrap_options = c("hover", full_width = F)
)
| Acronym | Definition |
|---|---|
| HIP-KNEE | Total Hip/Knee Arthroplasty |
| HF | Heart Failure |
| COPD | Chronic Obstructive Pulmonary Disease |
| AMI | Acute Myocardial Infarction |
| CABG | Coronary Artery Bypass Graft |
| PN | Pneumonia |
readmissionsClean <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
pivot_wider(
names_from = MeasureName,
values_from = c(NumberOfDischarges, ExcessReadmissionRatio, PredictedReadmissionRate, ExpectedReadmissionRate, NumberOfReadmissions),
id_cols = c(FacilityName, FacilityId, State, StartDate, EndDate)
)
# Check the new dataframe
dim(readmissionsClean)
## [1] 3129 35
head(readmissionsClean)
## # A tibble: 6 × 35
## FacilityName FacilityId State StartDate EndDate NumberOfDischarges_H…¹
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 SOUTHEAST HEALTH ME… 010001 AL 07/01/20… 06/30/… NA
## 2 MARSHALL MEDICAL CE… 010005 AL 07/01/20… 06/30/… NA
## 3 NORTH ALABAMA MEDIC… 010006 AL 07/01/20… 06/30/… NA
## 4 MIZELL MEMORIAL HOS… 010007 AL 07/01/20… 06/30/… NA
## 5 CRENSHAW COMMUNITY … 010008 AL 07/01/20… 06/30/… NA
## 6 ST. VINCENT'S EAST 010011 AL 07/01/20… 06/30/… NA
## # ℹ abbreviated name: ¹`NumberOfDischarges_HIP-KNEE`
## # ℹ 29 more variables: NumberOfDischarges_HF <dbl>,
## # NumberOfDischarges_AMI <dbl>, NumberOfDischarges_PN <dbl>,
## # NumberOfDischarges_CABG <dbl>, NumberOfDischarges_COPD <dbl>,
## # `ExcessReadmissionRatio_HIP-KNEE` <dbl>, ExcessReadmissionRatio_HF <dbl>,
## # ExcessReadmissionRatio_AMI <dbl>, ExcessReadmissionRatio_PN <dbl>,
## # ExcessReadmissionRatio_CABG <dbl>, ExcessReadmissionRatio_COPD <dbl>, …
readmissionsClean <- readmissionsClean %>%
select(FacilityName, FacilityId, State, matches("HIP-KNEE$"))
# Display first 10 rows of HCAHPS
head(HCAHPS,10)
## # A tibble: 10 × 22
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 6 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 7 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 8 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 9 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 10 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## # ℹ 15 more variables: TelephoneNumber <chr>, HcahpsMeasureId <chr>,
## # HcahpsQuestion <chr>, HcahpsAnswerDescription <chr>,
## # PatientSurveyStarRating <chr>, PatientSurveyStarRatingFootnote <dbl>,
## # HcahpsAnswerPercent <chr>, HcahpsAnswerPercentFootnote <chr>,
## # HcahpsLinearMeanValue <chr>, NumberOfCompletedSurveys <chr>,
## # NumberOfCompletedSurveysFootnote <chr>, SurveyResponseRatePercent <chr>,
## # SurveyResponseRatePercentFootnote <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- HCAHPS %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## PatientSurveyStarRatingFootnote
## 430641
# Removing all footnote columns
HCAHPS <- HCAHPS %>%
select(-ends_with("footnote"))
# Replacing all "Not Applicable" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
dictHCAHPS <- tribble(
~`Measure ID`, ~`Measure Name`,
"H-CLEAN-HSP-A-P", "Patients who reported that their room and bathroom were 'Always' clean",
"H-CLEAN-HSP-SN-P", "Patients who reported that their room and bathroom were 'Sometimes' or 'Never' clean",
"H-CLEAN-HSP-U-P", "Patients who reported that their room and bathroom were 'Usually' clean",
"H-CLEAN-HSP-STAR-RATING", "Cleanliness - star rating",
"H_CLEAN_LINEAR_SCORE", "Cleanliness - linear mean score",
"H-COMP-1-A-P", "Patients who reported that their nurses 'Always' communicated well",
"H-COMP-1-SN-P", "Patients who reported that their nurses 'Sometimes' or 'Never' communicated well",
"H-COMP-1-U-P", "Patients who reported that their nurses 'Usually' communicated well",
"H-COMP-1-STAR-RATING", "Nurse communication - star rating",
"H_COMP_1_LINEAR_SCORE", "Nurse communication - linear mean score",
"H-COMP-2-A-P", "Patients who reported that their doctors 'Always' communicated well",
"H-COMP-2-SN-P", "Patients who reported that their doctors 'Sometimes' or 'Never' communicated well",
"H-COMP-2-U-P", "Patients who reported that their doctors 'Usually' communicated well",
"H-COMP-2-STAR-RATING", "Doctor communication - star rating",
"H_COMP_2_LINEAR_SCORE", "Doctor communication - linear mean score",
"H-COMP-3-A-P", "Patients who reported that they 'Always' received help as soon as they wanted",
"H-COMP-3-SN-P", "Patients who reported that they 'Sometimes' or 'Never' received help as soon as they wanted",
"H-COMP-3-U-P", "Patients who reported that they 'Usually' received help as soon as they wanted",
"H-COMP-3-STAR-RATING", "Staff responsiveness - star rating",
"H_COMP_3_LINEAR_SCORE", "Staff responsiveness - linear mean score",
"H-COMP-5-A-P", "Patients who reported that staff 'Always' explained about medicines before giving it to them",
"H-COMP-5-SN-P", "Patients who reported that staff 'Sometimes' or 'Never' explained about medicines before giving it to them",
"H-COMP-5-U-P", "Patients who reported that staff 'Usually' explained about medicines before giving it to them",
"H-COMP-5-STAR-RATING", "Communication about medicine - star rating",
"H_COMP_5_LINEAR_SCORE", "Communication about medicines - linear mean score",
"H-COMP-6-N-P", "Patients who reported that NO, they were not given information about what to do during their recovery at home",
"H-COMP-6-Y-P", "Patients who reported that YES, they were given information about what to do during their recovery at home",
"H-COMP-6-STAR-RATING", "Discharge information - star rating",
"H_COMP_6_LINEAR_SCORE", "Discharge information - linear mean score",
"H-COMP-7-A", "Patients who 'Agree' they understood their care when they left the hospital",
"H-COMP-7-D-SD", "Patients who 'Disagree' or 'Strongly Disagree' that they understood their care when they left the hospital",
"H-COMP-7-SA", "Patients who 'Strongly Agree' that they understood their care when they left the hospital",
"H-COMP-7-STAR-RATING", "Care transition - star rating",
"H_COMP_7_LINEAR_SCORE", "Care transition - linear mean score",
"H-HSP-RATING-0-6", "Patients who gave their hospital a rating of 6 or lower on a scale from 0 (lowest) to 10 (highest)",
"H-HSP-RATING-7-8", "Patients who gave their hospital a rating of 7 or 8 on a scale from 0 (lowest) to 10 (highest)",
"H-HSP-RATING-9-10", "Patients who gave their hospital a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest)",
"H-HSP-RATING-STAR-RATING", "Overall rating of hospital - star rating",
"H_HSP_RATING_LINEAR_SCORE", "Overall hospital rating - linear mean score",
"H-QUIET-HSP-A-P", "Patients who reported that the area around their room was 'Always' quiet at night",
"H-QUIET-HSP-SN-P", "Patients who reported that the area around their room was 'Sometimes' or 'Never' quiet at night",
"H-QUIET-HSP-U-P", "Patients who reported that the area around their room was 'Usually' quiet at night",
"H-QUIET-HSP-STAR-RATING", "Quietness - star rating",
"H_QUIET_LINEAR_SCORE", "Quietness - linear mean score",
"H-RECMND-DN", "Patients who reported NO, they would probably not or definitely not recommend the hospital",
"H-RECMND-DY", "Patients who reported YES, they would definitely recommend the hospital",
"H-RECMND-PY", "Patients who reported YES, they would probably recommend the hospital",
"H-RECMND-STAR-RATING", "Recommend hospital - star rating",
"H_RECMND_LINEAR_SCORE", "Recommend hospital - linear mean score",
"H-STAR-RATING", "Summary star rating"
)
dictHCAHPS %>%
kable(
format = "html",
caption = "Table 3. Measure IDs and Measure Names from HCAHPS") %>%
kable_styling(bootstrap_options = c("hover", "full_width" = F))
| Measure ID | Measure Name |
|---|---|
| H-CLEAN-HSP-A-P | Patients who reported that their room and bathroom were ‘Always’ clean |
| H-CLEAN-HSP-SN-P | Patients who reported that their room and bathroom were ‘Sometimes’ or ‘Never’ clean |
| H-CLEAN-HSP-U-P | Patients who reported that their room and bathroom were ‘Usually’ clean |
| H-CLEAN-HSP-STAR-RATING | Cleanliness - star rating |
| H_CLEAN_LINEAR_SCORE | Cleanliness - linear mean score |
| H-COMP-1-A-P | Patients who reported that their nurses ‘Always’ communicated well |
| H-COMP-1-SN-P | Patients who reported that their nurses ‘Sometimes’ or ‘Never’ communicated well |
| H-COMP-1-U-P | Patients who reported that their nurses ‘Usually’ communicated well |
| H-COMP-1-STAR-RATING | Nurse communication - star rating |
| H_COMP_1_LINEAR_SCORE | Nurse communication - linear mean score |
| H-COMP-2-A-P | Patients who reported that their doctors ‘Always’ communicated well |
| H-COMP-2-SN-P | Patients who reported that their doctors ‘Sometimes’ or ‘Never’ communicated well |
| H-COMP-2-U-P | Patients who reported that their doctors ‘Usually’ communicated well |
| H-COMP-2-STAR-RATING | Doctor communication - star rating |
| H_COMP_2_LINEAR_SCORE | Doctor communication - linear mean score |
| H-COMP-3-A-P | Patients who reported that they ‘Always’ received help as soon as they wanted |
| H-COMP-3-SN-P | Patients who reported that they ‘Sometimes’ or ‘Never’ received help as soon as they wanted |
| H-COMP-3-U-P | Patients who reported that they ‘Usually’ received help as soon as they wanted |
| H-COMP-3-STAR-RATING | Staff responsiveness - star rating |
| H_COMP_3_LINEAR_SCORE | Staff responsiveness - linear mean score |
| H-COMP-5-A-P | Patients who reported that staff ‘Always’ explained about medicines before giving it to them |
| H-COMP-5-SN-P | Patients who reported that staff ‘Sometimes’ or ‘Never’ explained about medicines before giving it to them |
| H-COMP-5-U-P | Patients who reported that staff ‘Usually’ explained about medicines before giving it to them |
| H-COMP-5-STAR-RATING | Communication about medicine - star rating |
| H_COMP_5_LINEAR_SCORE | Communication about medicines - linear mean score |
| H-COMP-6-N-P | Patients who reported that NO, they were not given information about what to do during their recovery at home |
| H-COMP-6-Y-P | Patients who reported that YES, they were given information about what to do during their recovery at home |
| H-COMP-6-STAR-RATING | Discharge information - star rating |
| H_COMP_6_LINEAR_SCORE | Discharge information - linear mean score |
| H-COMP-7-A | Patients who ‘Agree’ they understood their care when they left the hospital |
| H-COMP-7-D-SD | Patients who ‘Disagree’ or ‘Strongly Disagree’ that they understood their care when they left the hospital |
| H-COMP-7-SA | Patients who ‘Strongly Agree’ that they understood their care when they left the hospital |
| H-COMP-7-STAR-RATING | Care transition - star rating |
| H_COMP_7_LINEAR_SCORE | Care transition - linear mean score |
| H-HSP-RATING-0-6 | Patients who gave their hospital a rating of 6 or lower on a scale from 0 (lowest) to 10 (highest) |
| H-HSP-RATING-7-8 | Patients who gave their hospital a rating of 7 or 8 on a scale from 0 (lowest) to 10 (highest) |
| H-HSP-RATING-9-10 | Patients who gave their hospital a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest) |
| H-HSP-RATING-STAR-RATING | Overall rating of hospital - star rating |
| H_HSP_RATING_LINEAR_SCORE | Overall hospital rating - linear mean score |
| H-QUIET-HSP-A-P | Patients who reported that the area around their room was ‘Always’ quiet at night |
| H-QUIET-HSP-SN-P | Patients who reported that the area around their room was ‘Sometimes’ or ‘Never’ quiet at night |
| H-QUIET-HSP-U-P | Patients who reported that the area around their room was ‘Usually’ quiet at night |
| H-QUIET-HSP-STAR-RATING | Quietness - star rating |
| H_QUIET_LINEAR_SCORE | Quietness - linear mean score |
| H-RECMND-DN | Patients who reported NO, they would probably not or definitely not recommend the hospital |
| H-RECMND-DY | Patients who reported YES, they would definitely recommend the hospital |
| H-RECMND-PY | Patients who reported YES, they would probably recommend the hospital |
| H-RECMND-STAR-RATING | Recommend hospital - star rating |
| H_RECMND_LINEAR_SCORE | Recommend hospital - linear mean score |
| H-STAR-RATING | Summary star rating |
HCAHPSClean <- HCAHPS %>%
pivot_wider(
names_from = HcahpsMeasureId,
values_from = c(PatientSurveyStarRating, HcahpsAnswerPercent, HcahpsLinearMeanValue, SurveyResponseRatePercent),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(HCAHPSClean)
## [1] 4814 375
head(HCAHPSClean)
## # A tibble: 6 × 375
## FacilityName FacilityId State PatientSurveyStarRat…¹ PatientSurveyStarRat…²
## <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEAL… 010001 AL <NA> <NA>
## 2 MARSHALL MEDIC… 010005 AL <NA> <NA>
## 3 NORTH ALABAMA … 010006 AL <NA> <NA>
## 4 MIZELL MEMORIA… 010007 AL <NA> <NA>
## 5 CRENSHAW COMMU… 010008 AL <NA> <NA>
## 6 ST. VINCENT'S … 010011 AL <NA> <NA>
## # ℹ abbreviated names: ¹PatientSurveyStarRating_H_COMP_1_A_P,
## # ²PatientSurveyStarRating_H_COMP_1_SN_P
## # ℹ 370 more variables: PatientSurveyStarRating_H_COMP_1_U_P <chr>,
## # PatientSurveyStarRating_H_COMP_1_LINEAR_SCORE <chr>,
## # PatientSurveyStarRating_H_COMP_1_STAR_RATING <chr>,
## # PatientSurveyStarRating_H_NURSE_RESPECT_A_P <chr>,
## # PatientSurveyStarRating_H_NURSE_RESPECT_SN_P <chr>, …
# Display first 10 rows of Timely_and_Effective_Care
head(Timely_and_Effective_Care,10)
## # A tibble: 10 × 16
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 6 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 7 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 8 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 9 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 10 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## # ℹ 9 more variables: TelephoneNumber <chr>, Condition <chr>, MeasureId <chr>,
## # MeasureName <chr>, Score <chr>, Sample <chr>, Footnote <chr>,
## # StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Timely_and_Effective_Care %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()
# Replacing all "Not Applicable" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
dictCare <- tribble(
~`Measure ID`, ~`Measure Name`,
"EDV", "Emergency department volume (alternate Measure ID: EDV-1)",
"ED-2", "Average (median) admit decision time to time of departure from the emergency department for emergency department patients admitted to inpatient status",
"IMM-3", "Healthcare workers given influenza vaccination",
"HCP COVID-19", "COVID-19 Vaccination Coverage Among HCP",
"OP-18b", "Average (median) time patients spent in the emergency department before leaving from the visit (alternate Measure ID: OP-18)",
"OP-18c", "Average time patients spent in the emergency department before being sent home (Median Time from ED Arrival to ED Departure for Discharged ED Patients – Psychiatric/Mental Health Patients) *This measure is only found in the downloadable database, it is not displayed on Hospital Care Compare",
"OP-22", "Percentage of patients who left the emergency department before being seen",
"OP-23", "Percentage of patients who came to the emergency department with stroke symptoms who received brain scan results within 45 minutes of arrival",
"OP-29", "Percentage of patients receiving appropriate recommendation for follow-up screening colonoscopy",
"OP-31", "Percentage of patients who had cataract surgery and had improvement in visual function within 90 days following the surgery",
"SEP-1", "Severe Sepsis and Septic Shock",
"SEP-SH-3HR", "Septic Shock 3 Hour",
"SEP-SH-6HR", "Septic Shock 6 Hour",
"SEV-SEP-3HR", "Severe Sepsis 3 Hour",
"SEV-SEP-6HR", "Severe Sepsis 6 Hour",
"STK-02", "Percentage of ischemic stroke patients prescribed or continuing to take antithrombotic therapy at hospital discharge",
"STK-03", "Percentage of ischemic stroke patients with atrial fibrillation/flutter who are prescribed or continuing to take anticoagulation therapy at hospital discharge",
"STK-05", "Percentage of ischemic stroke patients administered antithrombotic therapy by the end of hospital day 2",
"STK-06", "Percentage of ischemic stroke patients who are prescribed or continuing to take statin medication at hospital discharge",
"VTE-1", "Percentage of patients that received VTE prophylaxis after hospital admission or surgery",
"VTE-2", "Percentage of patients that received VTE prophylaxis after being admitted to the intensive care unit (ICU)",
"Safe Use of Opioids", "Percentage of patients who were prescribed 2 or more opioids or an opioid and benzodiazepine concurrently at discharge"
)
dictCare %>%
kable(
format = "html",
caption = "Table 4. Measure IDs and Measure Names from Timely and Effective Care") %>%
kable_styling(bootstrap_options = c("hover", "full_width" = F))
| Measure ID | Measure Name |
|---|---|
| EDV | Emergency department volume (alternate Measure ID: EDV-1) |
| ED-2 | Average (median) admit decision time to time of departure from the emergency department for emergency department patients admitted to inpatient status |
| IMM-3 | Healthcare workers given influenza vaccination |
| HCP COVID-19 | COVID-19 Vaccination Coverage Among HCP |
| OP-18b | Average (median) time patients spent in the emergency department before leaving from the visit (alternate Measure ID: OP-18) |
| OP-18c | Average time patients spent in the emergency department before being sent home (Median Time from ED Arrival to ED Departure for Discharged ED Patients – Psychiatric/Mental Health Patients) *This measure is only found in the downloadable database, it is not displayed on Hospital Care Compare |
| OP-22 | Percentage of patients who left the emergency department before being seen |
| OP-23 | Percentage of patients who came to the emergency department with stroke symptoms who received brain scan results within 45 minutes of arrival |
| OP-29 | Percentage of patients receiving appropriate recommendation for follow-up screening colonoscopy |
| OP-31 | Percentage of patients who had cataract surgery and had improvement in visual function within 90 days following the surgery |
| SEP-1 | Severe Sepsis and Septic Shock |
| SEP-SH-3HR | Septic Shock 3 Hour |
| SEP-SH-6HR | Septic Shock 6 Hour |
| SEV-SEP-3HR | Severe Sepsis 3 Hour |
| SEV-SEP-6HR | Severe Sepsis 6 Hour |
| STK-02 | Percentage of ischemic stroke patients prescribed or continuing to take antithrombotic therapy at hospital discharge |
| STK-03 | Percentage of ischemic stroke patients with atrial fibrillation/flutter who are prescribed or continuing to take anticoagulation therapy at hospital discharge |
| STK-05 | Percentage of ischemic stroke patients administered antithrombotic therapy by the end of hospital day 2 |
| STK-06 | Percentage of ischemic stroke patients who are prescribed or continuing to take statin medication at hospital discharge |
| VTE-1 | Percentage of patients that received VTE prophylaxis after hospital admission or surgery |
| VTE-2 | Percentage of patients that received VTE prophylaxis after being admitted to the intensive care unit (ICU) |
| Safe Use of Opioids | Percentage of patients who were prescribed 2 or more opioids or an opioid and benzodiazepine concurrently at discharge |
careClean <- Timely_and_Effective_Care %>%
pivot_wider(
names_from = MeasureId,
values_from = c(Score),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(careClean)
## [1] 4677 26
head(careClean)
## # A tibble: 6 × 26
## FacilityName FacilityId State EDV ED_2_Strata_1 ED_2_Strata_2 HCP_COVID_19
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEA… 010001 AL high <NA> <NA> 80.7
## 2 MARSHALL MEDI… 010005 AL high 148 105 79.8
## 3 NORTH ALABAMA… 010006 AL high <NA> <NA> 79
## 4 MIZELL MEMORI… 010007 AL low <NA> <NA> 57.9
## 5 CRENSHAW COMM… 010008 AL low <NA> <NA> 81.2
## 6 ST. VINCENT'S… 010011 AL high <NA> <NA> 88
## # ℹ 19 more variables: IMM_3 <chr>, OP_18b <chr>, OP_18c <chr>, OP_22 <chr>,
## # OP_23 <chr>, OP_29 <chr>, OP_31 <chr>, SAFE_USE_OF_OPIOIDS <chr>,
## # SEP_1 <chr>, SEP_SH_3HR <chr>, SEP_SH_6HR <chr>, SEV_SEP_3HR <chr>,
## # SEV_SEP_6HR <chr>, STK_02 <chr>, STK_03 <chr>, STK_05 <chr>, STK_06 <chr>,
## # VTE_1 <chr>, VTE_2 <chr>
# Display first 10 rows of Complications_and_Deaths
head(Complications_and_Deaths,10)
## # A tibble: 10 × 18
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 6 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 7 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 8 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 9 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 10 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## # ℹ 11 more variables: TelephoneNumber <chr>, MeasureId <chr>,
## # MeasureName <chr>, ComparedToNational <chr>, Denominator <chr>,
## # Score <chr>, LowerEstimate <chr>, HigherEstimate <chr>, Footnote <chr>,
## # StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Complications_and_Deaths %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()
# Replacing all "Not Applicable" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
dictDeaths <- tribble(
~`Measure ID`, ~`Measure Name`,
"COMP-HIP-KNEE", "Rate of complications for hip/knee replacement patients",
"PSI 90", "Serious complications (this is a composite or summary measure; alternate Measure ID: PSI-90-SAFETY)",
"PSI 03", "Pressure sores (alternate Measure ID: PSI_3_Ulcer)",
"PSI 04", "Deaths among patients with serious treatable complications after surgery (alternate Measure ID: PSI-4-SURG-COMP)",
"PSI 06", "Collapsed lung due to medical treatment (alternate Measure ID: PSI-6-IAT-PTX)",
"PSI 08", "Broken hip from a fall after surgery (alternate Measure ID: PSI_8_POST_HIP)",
"PSI 09", "Postoperative hemorrhage or hematoma rate (alternate Measure ID: PSI_9_POST_HEM)",
"PSI 10", "Kidney and diabetic complications after surgery (alternate Measure ID: PSI_10_POST_KIDNEY)",
"PSI 11", "Respiratory failure after surgery (alternate Measure ID: PSI_11_POST_RESP)",
"PSI 12", "Serious blood clots after surgery (alternate Measure ID: PSI-12-POSTOP-PULMEMB-DVT)",
"PSI 13", "Blood stream infection after surgery (alternate Measure ID: PSI_13_POST_SEPSIS)",
"PSI 14", "A wound that splits open after surgery on the abdomen or pelvis (alternate Measure ID: PSI-14-POSTOP-DEHIS)",
"PSI 15", "Accidental cuts and tears from medical treatment (alternate Measure ID: PSI-15-ACC-LAC)",
"MORT-30-AMI", "Death rate for heart attack patients",
"MORT-30-CABG", "Death rate for Coronary Artery Bypass Graft (CABG) surgery patients",
"MORT-30-COPD", "Death rate for chronic obstructive pulmonary disease (COPD) patients",
"MORT-30-HF", "Death rate for heart failure patients",
"MORT-30-PN", "Death rate for pneumonia patients",
"MORT-30-STK", "Death rate for stroke patients"
)
dictDeaths %>%
kable(
format = "html",
caption = "Table 5. Measure IDs and Measure Names from Complications and Deaths") %>%
kable_styling(bootstrap_options = c("hover", "full_width" = F))
| Measure ID | Measure Name |
|---|---|
| COMP-HIP-KNEE | Rate of complications for hip/knee replacement patients |
| PSI 90 | Serious complications (this is a composite or summary measure; alternate Measure ID: PSI-90-SAFETY) |
| PSI 03 | Pressure sores (alternate Measure ID: PSI_3_Ulcer) |
| PSI 04 | Deaths among patients with serious treatable complications after surgery (alternate Measure ID: PSI-4-SURG-COMP) |
| PSI 06 | Collapsed lung due to medical treatment (alternate Measure ID: PSI-6-IAT-PTX) |
| PSI 08 | Broken hip from a fall after surgery (alternate Measure ID: PSI_8_POST_HIP) |
| PSI 09 | Postoperative hemorrhage or hematoma rate (alternate Measure ID: PSI_9_POST_HEM) |
| PSI 10 | Kidney and diabetic complications after surgery (alternate Measure ID: PSI_10_POST_KIDNEY) |
| PSI 11 | Respiratory failure after surgery (alternate Measure ID: PSI_11_POST_RESP) |
| PSI 12 | Serious blood clots after surgery (alternate Measure ID: PSI-12-POSTOP-PULMEMB-DVT) |
| PSI 13 | Blood stream infection after surgery (alternate Measure ID: PSI_13_POST_SEPSIS) |
| PSI 14 | A wound that splits open after surgery on the abdomen or pelvis (alternate Measure ID: PSI-14-POSTOP-DEHIS) |
| PSI 15 | Accidental cuts and tears from medical treatment (alternate Measure ID: PSI-15-ACC-LAC) |
| MORT-30-AMI | Death rate for heart attack patients |
| MORT-30-CABG | Death rate for Coronary Artery Bypass Graft (CABG) surgery patients |
| MORT-30-COPD | Death rate for chronic obstructive pulmonary disease (COPD) patients |
| MORT-30-HF | Death rate for heart failure patients |
| MORT-30-PN | Death rate for pneumonia patients |
| MORT-30-STK | Death rate for stroke patients |
deathsClean <- Complications_and_Deaths %>%
pivot_wider(
names_from = MeasureId,
values_from = c(ComparedToNational, Score),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(deathsClean)
## [1] 4814 41
head(deathsClean)
## # A tibble: 6 × 41
## FacilityName FacilityId State ComparedToNational_C…¹ ComparedToNational_M…²
## <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEAL… 010001 AL No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005 AL No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006 AL No Different Than the… Worse Than the Nation…
## 4 MIZELL MEMORIA… 010007 AL Number of Cases Too S… Number of Cases Too S…
## 5 CRENSHAW COMMU… 010008 AL <NA> Number of Cases Too S…
## 6 ST. VINCENT'S … 010011 AL No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹ComparedToNational_COMP_HIP_KNEE,
## # ²ComparedToNational_MORT_30_AMI
## # ℹ 36 more variables: ComparedToNational_MORT_30_CABG <chr>,
## # ComparedToNational_MORT_30_COPD <chr>, ComparedToNational_MORT_30_HF <chr>,
## # ComparedToNational_MORT_30_PN <chr>, ComparedToNational_MORT_30_STK <chr>,
## # ComparedToNational_PSI_03 <chr>, ComparedToNational_PSI_04 <chr>,
## # ComparedToNational_PSI_06 <chr>, ComparedToNational_PSI_08 <chr>, …
# Display first 10 rows of Payment_and_Value_of_Care
head(Payment_and_Value_of_Care,10)
## # A tibble: 10 × 22
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 6 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 7 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 8 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 9 010006 NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL 35630 LAUDERDALE
## 10 010006 NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL 35630 LAUDERDALE
## # ℹ 15 more variables: TelephoneNumber <chr>, PaymentMeasureId <chr>,
## # PaymentMeasureName <chr>, PaymentCategory <chr>, Denominator <chr>,
## # Payment <chr>, LowerEstimate <chr>, HigherEstimate <chr>,
## # PaymentFootnote <dbl>, ValueOfCareDisplayId <chr>,
## # ValueOfCareDisplayName <chr>, ValueOfCareCategory <chr>,
## # ValueOfCareFootnote <dbl>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Payment_and_Value_of_Care %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## PaymentFootnote ValueOfCareFootnote
## 9956 10044
# Replacing all "Not Applicable" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
dictPayment <- tribble(
~`Measure ID`, ~`Measure Name`,
"PAYM-30-AMI", "Payment for heart attack patients",
"PAYM-30-HF", "Payment for heart failure patients",
"PAYM-30-PN", "Payment for pneumonia patients",
"PAYM_90_HIP_KNEE", "Payment for hip/knee replacement patients"
)
dictPayment %>%
kable(
format = "html",
caption = "Table 6. Measure IDs and Measure Names from Payment and Value of Care") %>%
kable_styling(bootstrap_options = c("hover", "full_width" = F))
| Measure ID | Measure Name |
|---|---|
| PAYM-30-AMI | Payment for heart attack patients |
| PAYM-30-HF | Payment for heart failure patients |
| PAYM-30-PN | Payment for pneumonia patients |
| PAYM_90_HIP_KNEE | Payment for hip/knee replacement patients |
paymentClean <- Payment_and_Value_of_Care %>%
pivot_wider(
names_from = PaymentMeasureId,
values_from = c(PaymentCategory, Payment),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(paymentClean)
## [1] 4645 11
head(paymentClean)
## # A tibble: 6 × 11
## FacilityName FacilityId State PaymentCategory_PAYM…¹ PaymentCategory_PAYM…²
## <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEAL… 010001 AL No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005 AL No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006 AL Greater Than the Nati… No Different Than the…
## 4 MIZELL MEMORIA… 010007 AL Number of Cases Too S… No Different Than the…
## 5 CRENSHAW COMMU… 010008 AL Number of Cases Too S… Number of Cases Too S…
## 6 ST. VINCENT'S … 010011 AL No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹PaymentCategory_PAYM_30_AMI,
## # ²PaymentCategory_PAYM_30_HF
## # ℹ 6 more variables: PaymentCategory_PAYM_30_PN <chr>,
## # PaymentCategory_PAYM_90_HIP_KNEE <chr>, Payment_PAYM_30_AMI <chr>,
## # Payment_PAYM_30_HF <chr>, Payment_PAYM_30_PN <chr>,
## # Payment_PAYM_90_HIP_KNEE <chr>
HipKneeClean <- readmissionsClean %>%
full_join(HCAHPSClean, by = "FacilityId") %>%
full_join(careClean, by = "FacilityId") %>%
full_join(deathsClean, by = "FacilityId") %>%
full_join(paymentClean, by = "FacilityId")
head(HipKneeClean)
## # A tibble: 6 × 451
## FacilityName.x FacilityId State.x NumberOfDischarges_HIP-KN…¹
## <chr> <chr> <chr> <dbl>
## 1 SOUTHEAST HEALTH MEDICAL CENTER 010001 AL NA
## 2 MARSHALL MEDICAL CENTERS 010005 AL NA
## 3 NORTH ALABAMA MEDICAL CENTER 010006 AL NA
## 4 MIZELL MEMORIAL HOSPITAL 010007 AL NA
## 5 CRENSHAW COMMUNITY HOSPITAL 010008 AL NA
## 6 ST. VINCENT'S EAST 010011 AL NA
## # ℹ abbreviated name: ¹`NumberOfDischarges_HIP-KNEE`
## # ℹ 447 more variables: `ExcessReadmissionRatio_HIP-KNEE` <dbl>,
## # `PredictedReadmissionRate_HIP-KNEE` <dbl>,
## # `ExpectedReadmissionRate_HIP-KNEE` <dbl>,
## # `NumberOfReadmissions_HIP-KNEE` <dbl>, FacilityName.y <chr>, State.y <chr>,
## # PatientSurveyStarRating_H_COMP_1_A_P <chr>,
## # PatientSurveyStarRating_H_COMP_1_SN_P <chr>, …
# Removing duplicate columns
HipKneeClean <- HipKneeClean %>%
select(-matches("\\.(x|y|z|w|v)$"))
# Checking the dimensions
dim(HipKneeClean)
# Count NA values in each column
na_counts <- sapply(HipKneeClean, function(x) sum(is.na(x)))
# View the NA counts
print(na_counts)
# Calculate the percentage of NA values for each column
na_percentage <- sapply(HipKneeClean, function(x) mean(is.na(x)))
# Remove columns where more than 80% of the values are NA
HipKneeClean <- HipKneeClean[, na_percentage <= 0.8]
# Count NA values in each column
na_counts <- sapply(HipKneeClean, function(x) sum(is.na(x)))
# View the NA counts
print(na_counts)
# Check the dimensions
dim(HipKneeClean)
# Remove columns containing 'AnswerPercent' or 'SurveyResponseRate'
HipKneeClean <- HipKneeClean %>%
select(-matches("AnswerPercent|SurveyResponseRate"))
# Check the dimensions
dim(HipKneeClean)
## [1] 4816 87
# Remove columns containing 'ComparedToNational' and 'PaymentCategory'
HipKneeClean <- HipKneeClean %>%
select(-matches("ComparedToNational|PaymentCategory"))
# Check the dimensions
dim(HipKneeClean)
## [1] 4816 67
str(HipKneeClean)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
## $ FacilityId : chr [1:4816] "010001" "010005" "010006" "010007" ...
## $ ExcessReadmissionRatio_HIP-KNEE : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
## $ PredictedReadmissionRate_HIP-KNEE : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
## $ ExpectedReadmissionRate_HIP-KNEE : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
## $ NumberOfReadmissions_HIP-KNEE : num [1:4816] 3 3 10 5 NA 6 10 9 NA 9 ...
## $ PatientSurveyStarRating_H_COMP_1_STAR_RATING : chr [1:4816] "3" "3" "2" "3" ...
## $ PatientSurveyStarRating_H_COMP_2_STAR_RATING : chr [1:4816] "4" "4" "3" "5" ...
## $ PatientSurveyStarRating_H_COMP_3_STAR_RATING : chr [1:4816] "3" "2" "2" "4" ...
## $ PatientSurveyStarRating_H_COMP_5_STAR_RATING : chr [1:4816] "3" "3" "2" "3" ...
## $ PatientSurveyStarRating_H_COMP_6_STAR_RATING : chr [1:4816] "4" "3" "3" "4" ...
## $ PatientSurveyStarRating_H_COMP_7_STAR_RATING : chr [1:4816] "4" "3" "2" "4" ...
## $ PatientSurveyStarRating_H_CLEAN_STAR_RATING : chr [1:4816] "3" "2" "1" "2" ...
## $ PatientSurveyStarRating_H_QUIET_STAR_RATING : chr [1:4816] "4" "4" "4" "4" ...
## $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: chr [1:4816] "4" "3" "2" "4" ...
## $ PatientSurveyStarRating_H_RECMND_STAR_RATING : chr [1:4816] "4" "3" "2" "4" ...
## $ PatientSurveyStarRating_H_STAR_RATING : chr [1:4816] "4" "3" "2" "4" ...
## $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE : chr [1:4816] "89" "90" "88" "91" ...
## $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE : chr [1:4816] "91" "92" "89" "95" ...
## $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE : chr [1:4816] "81" "75" "75" "88" ...
## $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE : chr [1:4816] "77" "76" "71" "77" ...
## $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE : chr [1:4816] "87" "86" "83" "87" ...
## $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE : chr [1:4816] "82" "79" "77" "82" ...
## $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE : chr [1:4816] "84" "80" "74" "80" ...
## $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE : chr [1:4816] "86" "85" "85" "87" ...
## $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : chr [1:4816] "89" "85" "82" "89" ...
## $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE : chr [1:4816] "90" "83" "79" "88" ...
## $ EDV : chr [1:4816] "high" "high" "high" "low" ...
## $ ED_2_Strata_1 : chr [1:4816] NA "148" NA NA ...
## $ HCP_COVID_19 : chr [1:4816] "80.7" "79.8" "79" "57.9" ...
## $ IMM_3 : chr [1:4816] "95" "80" "67" "53" ...
## $ OP_18b : chr [1:4816] "215" "147" "177" "130" ...
## $ OP_18c : chr [1:4816] "317" "266" NA "216" ...
## $ OP_22 : chr [1:4816] "5" "3" "1" "4" ...
## $ OP_23 : chr [1:4816] NA NA "69" NA ...
## $ OP_29 : chr [1:4816] "47" "96" "85" "23" ...
## $ SAFE_USE_OF_OPIOIDS : chr [1:4816] "14" "19" "17" NA ...
## $ SEP_1 : chr [1:4816] "66" "74" "56" "86" ...
## $ SEP_SH_3HR : chr [1:4816] "70" "88" "77" NA ...
## $ SEP_SH_6HR : chr [1:4816] "100" "91" "81" NA ...
## $ SEV_SEP_3HR : chr [1:4816] "79" "88" "78" "89" ...
## $ SEV_SEP_6HR : chr [1:4816] "95" "96" "86" "97" ...
## $ STK_02 : chr [1:4816] "98" "100" "96" NA ...
## $ STK_05 : chr [1:4816] NA "91" NA NA ...
## $ STK_06 : chr [1:4816] NA NA "97" NA ...
## $ VTE_1 : chr [1:4816] "98" NA NA NA ...
## $ VTE_2 : chr [1:4816] "99" NA "97" NA ...
## $ Score_COMP_HIP_KNEE : chr [1:4816] "2.7" "2.3" "4.6" NA ...
## $ Score_MORT_30_AMI : chr [1:4816] "12" "13.6" "16.5" NA ...
## $ Score_MORT_30_COPD : chr [1:4816] "8.8" "9.9" "9.9" "13.7" ...
## $ Score_MORT_30_HF : chr [1:4816] "8.9" "14.9" "12.5" "12.5" ...
## $ Score_MORT_30_PN : chr [1:4816] "18" "23.3" "19.5" "28.5" ...
## $ Score_MORT_30_STK : chr [1:4816] "14.8" "15.3" "17.2" NA ...
## $ Score_PSI_03 : chr [1:4816] "0.39" "0.94" "1.39" "0.42" ...
## $ Score_PSI_04 : chr [1:4816] "184.68" "183.49" "173.63" NA ...
## $ Score_PSI_06 : chr [1:4816] "0.23" "0.22" "0.36" "0.24" ...
## $ Score_PSI_08 : chr [1:4816] "0.10" "0.09" "0.08" "0.09" ...
## $ Score_PSI_09 : chr [1:4816] "2.39" "2.69" "5.43" "2.49" ...
## $ Score_PSI_10 : chr [1:4816] "1.14" "1.37" "1.26" "1.57" ...
## $ Score_PSI_11 : chr [1:4816] "13.83" "7.19" "7.37" "8.45" ...
## $ Score_PSI_12 : chr [1:4816] "4.49" "3.01" "3.36" "3.89" ...
## $ Score_PSI_13 : chr [1:4816] "8.05" "4.46" "4.37" "5.19" ...
## $ Score_PSI_14 : chr [1:4816] "1.69" "1.87" "1.76" NA ...
## $ Score_PSI_15 : chr [1:4816] "0.93" "0.91" "1.34" "1.08" ...
## $ Score_PSI_90 : chr [1:4816] "1.21" "0.97" "1.17" "0.95" ...
## $ FacilityName : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
## $ State : chr [1:4816] "AL" "AL" "AL" "AL" ...
## $ Payment_PAYM_90_HIP_KNEE : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...
# Convert columns to numeric
HipKneeClean <- HipKneeClean %>%
mutate_at(vars(starts_with("PatientSurveyStarRating_"),
starts_with("HcahpsLinearMeanValue_"),
starts_with("Score_"),
starts_with("ED_"),
starts_with("IMM_"),
starts_with("OP_"),
starts_with("SEP_"),
starts_with("SEV_"),
starts_with("STK_"),
starts_with("VTE_"),
starts_with("SAFE_"),
starts_with("HCP_")),
~ as.numeric(as.character(.)))
# View the structure
str(HipKneeClean)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
## $ FacilityId : chr [1:4816] "010001" "010005" "010006" "010007" ...
## $ ExcessReadmissionRatio_HIP-KNEE : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
## $ PredictedReadmissionRate_HIP-KNEE : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
## $ ExpectedReadmissionRate_HIP-KNEE : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
## $ NumberOfReadmissions_HIP-KNEE : num [1:4816] 3 3 10 5 NA 6 10 9 NA 9 ...
## $ PatientSurveyStarRating_H_COMP_1_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_2_STAR_RATING : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_3_STAR_RATING : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_5_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_6_STAR_RATING : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
## $ PatientSurveyStarRating_H_COMP_7_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_CLEAN_STAR_RATING : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
## $ PatientSurveyStarRating_H_QUIET_STAR_RATING : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
## $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_RECMND_STAR_RATING : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
## $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
## $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
## $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
## $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
## $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
## $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
## $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
## $ EDV : chr [1:4816] "high" "high" "high" "low" ...
## $ ED_2_Strata_1 : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
## $ HCP_COVID_19 : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
## $ IMM_3 : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
## $ OP_18b : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
## $ OP_18c : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
## $ OP_22 : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
## $ OP_23 : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
## $ OP_29 : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
## $ SAFE_USE_OF_OPIOIDS : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
## $ SEP_1 : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
## $ SEP_SH_3HR : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
## $ SEP_SH_6HR : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
## $ SEV_SEP_3HR : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
## $ SEV_SEP_6HR : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
## $ STK_02 : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
## $ STK_05 : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
## $ STK_06 : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
## $ VTE_1 : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
## $ VTE_2 : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
## $ Score_COMP_HIP_KNEE : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
## $ Score_MORT_30_AMI : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
## $ Score_MORT_30_COPD : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
## $ Score_MORT_30_HF : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
## $ Score_MORT_30_PN : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
## $ Score_MORT_30_STK : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
## $ Score_PSI_03 : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
## $ Score_PSI_04 : num [1:4816] 185 183 174 NA NA ...
## $ Score_PSI_06 : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
## $ Score_PSI_08 : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
## $ Score_PSI_09 : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
## $ Score_PSI_10 : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
## $ Score_PSI_11 : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
## $ Score_PSI_12 : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
## $ Score_PSI_13 : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
## $ Score_PSI_14 : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
## $ Score_PSI_15 : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
## $ Score_PSI_90 : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
## $ FacilityName : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
## $ State : chr [1:4816] "AL" "AL" "AL" "AL" ...
## $ Payment_PAYM_90_HIP_KNEE : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...
# Remove $ and , and convert to numeric
HipKneeClean <- HipKneeClean %>%
mutate_at(vars(starts_with("Payment_")),
~ as.numeric(gsub("[\\$,]", "", .)))
# Checking the structure
str(HipKneeClean)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
## $ FacilityId : chr [1:4816] "010001" "010005" "010006" "010007" ...
## $ ExcessReadmissionRatio_HIP-KNEE : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
## $ PredictedReadmissionRate_HIP-KNEE : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
## $ ExpectedReadmissionRate_HIP-KNEE : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
## $ NumberOfReadmissions_HIP-KNEE : num [1:4816] 3 3 10 5 NA 6 10 9 NA 9 ...
## $ PatientSurveyStarRating_H_COMP_1_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_2_STAR_RATING : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_3_STAR_RATING : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_5_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_6_STAR_RATING : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
## $ PatientSurveyStarRating_H_COMP_7_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_CLEAN_STAR_RATING : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
## $ PatientSurveyStarRating_H_QUIET_STAR_RATING : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
## $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_RECMND_STAR_RATING : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
## $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
## $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
## $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
## $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
## $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
## $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
## $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
## $ EDV : chr [1:4816] "high" "high" "high" "low" ...
## $ ED_2_Strata_1 : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
## $ HCP_COVID_19 : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
## $ IMM_3 : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
## $ OP_18b : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
## $ OP_18c : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
## $ OP_22 : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
## $ OP_23 : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
## $ OP_29 : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
## $ SAFE_USE_OF_OPIOIDS : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
## $ SEP_1 : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
## $ SEP_SH_3HR : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
## $ SEP_SH_6HR : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
## $ SEV_SEP_3HR : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
## $ SEV_SEP_6HR : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
## $ STK_02 : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
## $ STK_05 : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
## $ STK_06 : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
## $ VTE_1 : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
## $ VTE_2 : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
## $ Score_COMP_HIP_KNEE : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
## $ Score_MORT_30_AMI : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
## $ Score_MORT_30_COPD : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
## $ Score_MORT_30_HF : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
## $ Score_MORT_30_PN : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
## $ Score_MORT_30_STK : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
## $ Score_PSI_03 : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
## $ Score_PSI_04 : num [1:4816] 185 183 174 NA NA ...
## $ Score_PSI_06 : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
## $ Score_PSI_08 : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
## $ Score_PSI_09 : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
## $ Score_PSI_10 : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
## $ Score_PSI_11 : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
## $ Score_PSI_12 : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
## $ Score_PSI_13 : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
## $ Score_PSI_14 : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
## $ Score_PSI_15 : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
## $ Score_PSI_90 : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
## $ FacilityName : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
## $ State : chr [1:4816] "AL" "AL" "AL" "AL" ...
## $ Payment_PAYM_90_HIP_KNEE : num [1:4816] 22212 18030 21898 NA NA ...
save(HipKneeClean, file = "HipKneeClean.RData")
# Select numeric columns
numeric_columns <- select_if(HipKneeClean, is.numeric)
# Calculate descriptive statistics
descr_stats <- psych::describe(numeric_columns)
# Convert to a data frame
descr_stats_df <- as.data.frame(descr_stats)
# Display the table
kable(descr_stats_df, format = "html", caption = "Table 6. Descriptive Statistics for Numeric Variables in Cleaned Dataset") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ExcessReadmissionRatio_HIP-KNEE | 1 | 1838 | 1.004161e+00 | 0.1263979 | 0.9921 | 1.000079e+00 | 0.1119363 | 0.6159 | 1.5162 | 0.9003 | 0.3862731 | 0.7447069 | 0.0029483 |
| PredictedReadmissionRate_HIP-KNEE | 2 | 1838 | 4.546552e+00 | 0.9092848 | 4.4768 | 4.511130e+00 | 0.8590184 | 1.9279 | 8.5690 | 6.6411 | 0.4370579 | 0.4866461 | 0.0212093 |
| ExpectedReadmissionRate_HIP-KNEE | 3 | 1838 | 4.519903e+00 | 0.6637697 | 4.4544 | 4.484779e+00 | 0.6165392 | 2.6749 | 7.6240 | 4.9491 | 0.6361300 | 1.0010780 | 0.0154826 |
| NumberOfReadmissions_HIP-KNEE | 4 | 1838 | 8.098477e+00 | 7.8400996 | 7.0000 | 6.813859e+00 | 4.4478000 | 1.0000 | 125.0000 | 124.0000 | 4.5178495 | 40.4373466 | 0.1828727 |
| PatientSurveyStarRating_H_COMP_1_STAR_RATING | 5 | 3255 | 3.260215e+00 | 1.0059133 | 3.0000 | 3.241843e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | 0.0239346 | -0.4825494 | 0.0176313 |
| PatientSurveyStarRating_H_COMP_2_STAR_RATING | 6 | 3255 | 3.428264e+00 | 0.9474515 | 3.0000 | 3.450672e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.3484208 | -0.0771131 | 0.0166066 |
| PatientSurveyStarRating_H_COMP_3_STAR_RATING | 7 | 3255 | 3.372350e+00 | 1.0909348 | 4.0000 | 3.388100e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.2839572 | -0.8381418 | 0.0191216 |
| PatientSurveyStarRating_H_COMP_5_STAR_RATING | 8 | 3255 | 3.064516e+00 | 0.9126664 | 3.0000 | 3.062572e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.0135291 | -0.3800413 | 0.0159969 |
| PatientSurveyStarRating_H_COMP_6_STAR_RATING | 9 | 3255 | 3.388940e+00 | 0.9148777 | 3.0000 | 3.401919e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.3167335 | 0.0370958 | 0.0160357 |
| PatientSurveyStarRating_H_COMP_7_STAR_RATING | 10 | 3255 | 3.167742e+00 | 0.9963691 | 3.0000 | 3.144338e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.0232912 | -0.4736871 | 0.0174640 |
| PatientSurveyStarRating_H_CLEAN_STAR_RATING | 11 | 3255 | 3.049770e+00 | 1.1197420 | 3.0000 | 3.063724e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.1031912 | -0.6984520 | 0.0196265 |
| PatientSurveyStarRating_H_QUIET_STAR_RATING | 12 | 3255 | 3.214132e+00 | 1.1166932 | 3.0000 | 3.228791e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.1265578 | -0.6910097 | 0.0195730 |
| PatientSurveyStarRating_H_HSP_RATING_STAR_RATING | 13 | 3255 | 3.243318e+00 | 0.9195166 | 3.0000 | 3.268330e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.2786674 | 0.0381366 | 0.0161170 |
| PatientSurveyStarRating_H_RECMND_STAR_RATING | 14 | 3255 | 3.497696e+00 | 1.0287408 | 4.0000 | 3.554702e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.6160258 | -0.1587475 | 0.0180314 |
| PatientSurveyStarRating_H_STAR_RATING | 15 | 3255 | 3.295545e+00 | 0.9142197 | 3.0000 | 3.294818e+00 | 1.4826000 | 1.0000 | 5.0000 | 4.0000 | -0.1485226 | -0.2457959 | 0.0160242 |
| HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE | 16 | 3255 | 9.049002e+01 | 2.9117012 | 91.0000 | 9.063608e+01 | 2.9652000 | 77.0000 | 100.0000 | 23.0000 | -0.6770638 | 1.4141442 | 0.0510354 |
| HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE | 17 | 3255 | 9.028879e+01 | 2.8233453 | 90.0000 | 9.039079e+01 | 2.9652000 | 76.0000 | 100.0000 | 24.0000 | -0.4886223 | 1.1133723 | 0.0494867 |
| HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE | 18 | 3255 | 8.276897e+01 | 5.3062993 | 83.0000 | 8.280499e+01 | 4.4478000 | 61.0000 | 100.0000 | 39.0000 | -0.1793486 | 0.4429645 | 0.0930071 |
| HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE | 19 | 3255 | 7.554593e+01 | 5.1600947 | 75.0000 | 7.547178e+01 | 4.4478000 | 51.0000 | 99.0000 | 48.0000 | 0.0989232 | 0.6200355 | 0.0904445 |
| HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE | 20 | 3255 | 8.568786e+01 | 4.1729980 | 86.0000 | 8.596238e+01 | 2.9652000 | 59.0000 | 100.0000 | 41.0000 | -0.9067227 | 2.2178795 | 0.0731430 |
| HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE | 21 | 3255 | 8.040277e+01 | 3.2911839 | 81.0000 | 8.046795e+01 | 2.9652000 | 64.0000 | 97.0000 | 33.0000 | -0.2792695 | 1.0404404 | 0.0576868 |
| HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE | 22 | 3255 | 8.560799e+01 | 4.7038213 | 86.0000 | 8.575931e+01 | 4.4478000 | 68.0000 | 99.0000 | 31.0000 | -0.3531253 | 0.1936870 | 0.0824471 |
| HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE | 23 | 3255 | 8.172657e+01 | 5.6927220 | 82.0000 | 8.193282e+01 | 5.9304000 | 56.0000 | 99.0000 | 43.0000 | -0.3843168 | 0.2536999 | 0.0997802 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | 24 | 3255 | 8.709708e+01 | 4.0649053 | 88.0000 | 8.730940e+01 | 2.9652000 | 65.0000 | 98.0000 | 33.0000 | -0.6979236 | 1.3325347 | 0.0712484 |
| HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE | 25 | 3255 | 8.607343e+01 | 5.2109599 | 87.0000 | 8.638349e+01 | 4.4478000 | 57.0000 | 99.0000 | 42.0000 | -0.7407627 | 1.2385658 | 0.0913361 |
| ED_2_Strata_1 | 26 | 1107 | 1.064002e+02 | 114.0608548 | 74.0000 | 8.602368e+01 | 50.4084000 | 0.0000 | 1078.0000 | 1078.0000 | 3.9796326 | 22.8953094 | 3.4281736 |
| HCP_COVID_19 | 27 | 3633 | 8.767556e+01 | 10.6218376 | 90.1000 | 8.903068e+01 | 9.4886400 | 0.5000 | 100.0000 | 99.5000 | -1.4380632 | 3.5511860 | 0.1762248 |
| IMM_3 | 28 | 4140 | 7.782681e+01 | 18.5753061 | 83.0000 | 8.024245e+01 | 17.7912000 | 0.0000 | 100.0000 | 100.0000 | -1.0622115 | 0.7180171 | 0.2886927 |
| OP_18b | 29 | 4067 | 1.617780e+02 | 54.6367977 | 153.0000 | 1.572833e+02 | 53.3736000 | 38.0000 | 587.0000 | 549.0000 | 0.9959452 | 2.0284358 | 0.8567382 |
| OP_18c | 30 | 3098 | 2.967434e+02 | 177.2966416 | 255.0000 | 2.700827e+02 | 117.1254000 | 40.0000 | 2954.0000 | 2914.0000 | 3.5082626 | 28.6044027 | 3.1853694 |
| OP_22 | 31 | 3841 | 2.385056e+00 | 2.3270098 | 2.0000 | 2.028637e+00 | 1.4826000 | 0.0000 | 19.0000 | 19.0000 | 1.7797727 | 4.6722341 | 0.0375471 |
| OP_23 | 32 | 1535 | 7.062801e+01 | 19.2269197 | 74.0000 | 7.247193e+01 | 17.7912000 | 0.0000 | 100.0000 | 100.0000 | -0.9377173 | 0.8863709 | 0.4907446 |
| OP_29 | 33 | 2830 | 9.125230e+01 | 14.2952983 | 96.0000 | 9.458083e+01 | 5.9304000 | 0.0000 | 100.0000 | 100.0000 | -3.1390450 | 11.9252917 | 0.2687200 |
| SAFE_USE_OF_OPIOIDS | 34 | 3670 | 1.561226e+01 | 5.6808277 | 15.0000 | 1.537568e+01 | 4.4478000 | 0.0000 | 45.0000 | 45.0000 | 0.6530895 | 2.0207900 | 0.0937732 |
| SEP_1 | 35 | 3097 | 5.982661e+01 | 16.7144073 | 61.0000 | 6.045180e+01 | 16.3086000 | 0.0000 | 100.0000 | 100.0000 | -0.4029034 | 0.1050294 | 0.3003450 |
| SEP_SH_3HR | 36 | 2620 | 6.724809e+01 | 17.8935243 | 68.0000 | 6.776813e+01 | 19.2738000 | 0.0000 | 100.0000 | 100.0000 | -0.2914787 | -0.2730680 | 0.3495789 |
| SEP_SH_6HR | 37 | 2039 | 8.305983e+01 | 15.4244924 | 87.0000 | 8.529455e+01 | 11.8608000 | 7.0000 | 100.0000 | 93.0000 | -1.5023285 | 2.7030480 | 0.3415877 |
| SEV_SEP_3HR | 38 | 3086 | 7.904342e+01 | 11.1773414 | 81.0000 | 7.998907e+01 | 10.3782000 | 0.0000 | 100.0000 | 100.0000 | -1.4060736 | 4.9567276 | 0.2012058 |
| SEV_SEP_6HR | 39 | 2937 | 8.871263e+01 | 11.3390205 | 92.0000 | 9.061974e+01 | 7.4130000 | 0.0000 | 100.0000 | 100.0000 | -2.4051707 | 8.9993568 | 0.2092298 |
| STK_02 | 40 | 1537 | 9.529733e+01 | 6.1635225 | 97.0000 | 9.638262e+01 | 2.9652000 | 23.0000 | 100.0000 | 77.0000 | -4.7304997 | 35.8413538 | 0.1572143 |
| STK_05 | 41 | 1094 | 9.278702e+01 | 7.1402994 | 94.0000 | 9.367352e+01 | 4.4478000 | 2.0000 | 100.0000 | 98.0000 | -5.4512524 | 54.5350543 | 0.2158777 |
| STK_06 | 42 | 1298 | 9.464946e+01 | 7.5064354 | 96.0000 | 9.581154e+01 | 2.9652000 | 0.0000 | 100.0000 | 100.0000 | -7.3728956 | 77.6048723 | 0.2083514 |
| VTE_1 | 43 | 2216 | 8.246435e+01 | 19.1503872 | 89.0000 | 8.603777e+01 | 11.8608000 | 0.0000 | 100.0000 | 100.0000 | -1.7514646 | 3.0387995 | 0.4068110 |
| VTE_2 | 44 | 1413 | 9.383015e+01 | 9.7362977 | 97.0000 | 9.588241e+01 | 2.9652000 | 3.0000 | 100.0000 | 97.0000 | -4.1429487 | 23.6985405 | 0.2590137 |
| Score_COMP_HIP_KNEE | 45 | 2090 | 3.182392e+00 | 0.5482694 | 3.1000 | 3.150419e+00 | 0.4447800 | 1.6000 | 6.2000 | 4.6000 | 0.7716603 | 1.9037431 | 0.0119928 |
| Score_MORT_30_AMI | 46 | 1943 | 1.254359e+01 | 1.1553168 | 12.5000 | 1.251608e+01 | 1.0378200 | 8.9000 | 17.1000 | 8.2000 | 0.2785565 | 0.5897728 | 0.0262099 |
| Score_MORT_30_COPD | 47 | 2569 | 9.185286e+00 | 1.3614554 | 9.1000 | 9.121196e+00 | 1.3343400 | 5.2000 | 14.9000 | 9.7000 | 0.5044944 | 0.5326934 | 0.0268609 |
| Score_MORT_30_HF | 48 | 3056 | 1.182863e+01 | 1.9384358 | 11.8000 | 1.180581e+01 | 1.7791200 | 5.5000 | 20.4000 | 14.9000 | 0.1359787 | 0.4028740 | 0.0350651 |
| Score_MORT_30_PN | 49 | 3514 | 1.833056e+01 | 2.5441335 | 18.2000 | 1.826543e+01 | 2.3721600 | 8.6000 | 29.5000 | 20.9000 | 0.3130182 | 0.5748975 | 0.0429180 |
| Score_MORT_30_STK | 50 | 2123 | 1.379157e+01 | 1.8194129 | 13.7000 | 1.371648e+01 | 1.7791200 | 8.0000 | 21.9000 | 13.9000 | 0.4400162 | 0.5676934 | 0.0394872 |
| Score_PSI_03 | 51 | 3169 | 5.805491e-01 | 0.4702323 | 0.4800 | 5.037288e-01 | 0.2372160 | 0.0500 | 6.3100 | 6.2600 | 4.0520349 | 30.4735061 | 0.0083532 |
| Score_PSI_04 | 52 | 1609 | 1.687290e+02 | 21.3153769 | 167.7400 | 1.687267e+02 | 20.2523160 | 86.6800 | 241.8100 | 155.1300 | -0.0315953 | 0.4882789 | 0.5313920 |
| Score_PSI_06 | 53 | 3188 | 2.476851e-01 | 0.0402023 | 0.2400 | 2.442712e-01 | 0.0296520 | 0.1200 | 0.5100 | 0.3900 | 1.1937679 | 3.4906535 | 0.0007120 |
| Score_PSI_08 | 54 | 3189 | 9.043270e-02 | 0.0070889 | 0.0900 | 9.019980e-02 | 0.0000000 | 0.0600 | 0.1300 | 0.0700 | 0.5878153 | 2.8393103 | 0.0001255 |
| Score_PSI_09 | 55 | 2930 | 2.508707e+00 | 0.4395922 | 2.4600 | 2.478486e+00 | 0.2668680 | 1.1000 | 6.1000 | 5.0000 | 1.3622221 | 5.9305415 | 0.0081211 |
| Score_PSI_10 | 56 | 2593 | 1.569626e+00 | 0.3418816 | 1.5300 | 1.535055e+00 | 0.1186080 | 0.4700 | 4.5500 | 4.0800 | 1.9801686 | 8.7853292 | 0.0067139 |
| Score_PSI_11 | 57 | 2603 | 9.045517e+00 | 3.2148329 | 8.3900 | 8.740322e+00 | 2.1201180 | 2.7300 | 66.8500 | 64.1200 | 4.3362544 | 54.8289666 | 0.0630117 |
| Score_PSI_12 | 58 | 2935 | 3.597278e+00 | 0.7194093 | 3.5000 | 3.542005e+00 | 0.5633880 | 1.6100 | 7.5100 | 5.9000 | 1.0157663 | 2.2968831 | 0.0132792 |
| Score_PSI_13 | 59 | 2549 | 5.298133e+00 | 0.9887454 | 5.1300 | 5.224669e+00 | 0.7116480 | 2.1700 | 13.4900 | 11.3200 | 1.1662305 | 4.3263395 | 0.0195839 |
| Score_PSI_14 | 60 | 2592 | 2.010590e+00 | 0.3338405 | 1.9400 | 1.969769e+00 | 0.1482600 | 0.8900 | 4.4000 | 3.5100 | 1.9779060 | 7.1675818 | 0.0065572 |
| Score_PSI_15 | 61 | 2916 | 1.101708e+00 | 0.2939729 | 1.0500 | 1.067549e+00 | 0.1630860 | 0.3500 | 3.4300 | 3.0800 | 1.8219347 | 6.3874487 | 0.0054439 |
| Score_PSI_90 | 62 | 3011 | 1.001588e+00 | 0.1793301 | 0.9700 | 9.839477e-01 | 0.1186080 | 0.5500 | 2.7400 | 2.1900 | 2.0610890 | 10.6309961 | 0.0032681 |
| Payment_PAYM_90_HIP_KNEE | 63 | 2001 | 2.105813e+04 | 2079.2072318 | 20899.0000 | 2.093031e+04 | 1756.8810000 | 15936.0000 | 48153.0000 | 32217.0000 | 1.7757439 | 15.5385065 | 46.4808683 |
# Visualizing the distribution of EDV (Emergency Department Volume)
ggplot(HipKneeClean, aes(x = EDV)) +
geom_bar(fill = "skyblue", color = "black", alpha = 0.7) +
labs(title = "Figure 1. Distribution of Emergency Department Volume",
x = "EDV",
y = "Count") +
theme_minimal() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
# Data preparation
facility_counts <- HipKneeClean %>%
group_by(State) %>%
summarise(Count = n(), .groups = 'drop')
# Check the first few rows
head(facility_counts)
## # A tibble: 6 × 2
## State Count
## <chr> <int>
## 1 AK 21
## 2 AL 88
## 3 AR 79
## 4 AS 1
## 5 AZ 82
## 6 CA 327
# Get state boundaries
states_map <- map_data("state")
# Create a mapping from state abbreviations to full state names
state_mapping <- data.frame(
State = state.abb,
full_state_name = tolower(state.name),
stringsAsFactors = FALSE
)
# Add full state names to facility_counts
facility_counts <- merge(facility_counts, state_mapping, by.x = "State", by.y = "State")
# Join facility counts with state map data
facility_map_data <- left_join(states_map, facility_counts, by = c("region" = "full_state_name"))
# Replace NA values with 0 in the Count column
facility_map_data$Count[is.na(facility_map_data$Count)] <- 0
# Plot the map with facility counts
ggplot(data = facility_map_data) +
geom_polygon(aes(x = long, y = lat, group = group, fill = Count), color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue", na.value = "grey50", name = "Facility Count") +
theme_minimal() +
labs(title = "Figure 2. Number of Facilities per State") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
plot.background = element_blank())
# Rename column
HipKneeClean <- HipKneeClean %>%
rename(PredictedReadmissionRate_HIP_KNEE = `PredictedReadmissionRate_HIP-KNEE`)
# Calculate the average PredictedReadmissionRate_HIP-KNEE per state
average_readmission_rate <- HipKneeClean %>%
group_by(State) %>%
summarize(Average_PredictedReadmissionRate_HIP_KNEE = mean(PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE))
# Add full state names to the average readmission rate data
average_readmission_rate <- merge(average_readmission_rate, state_mapping, by.x = "State", by.y = "State")
# Join average readmission rate with state map data
readmission_map_data <- left_join(states_map, average_readmission_rate, by = c("region" = "full_state_name"))
# Plot the map with average readmission rates
ggplot(data = readmission_map_data) +
geom_polygon(aes(x = long, y = lat, group = group, fill = Average_PredictedReadmissionRate_HIP_KNEE), color = "white") +
scale_fill_gradient(low = "lightgreen", high = "darkgreen", name = "Average Predicted\nReadmission Rate") +
theme_minimal() +
labs(title = "Figure 3. Average Predicted Readmission Rate for Hip/Knee Replacement per State") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
plot.background = element_blank())
# Create a histogram of PredictedReadmissionRate_HIP_KNEE
ggplot(HipKneeClean, aes(x = PredictedReadmissionRate_HIP_KNEE)) +
geom_histogram(binwidth = 0.25, fill = "skyblue", color = "black") +
labs(title = "Figure 4. Histogram of Predicted Readmission Rate for Hip/Knee Replacement",
x = "Predicted Readmission Rate",
y = "Frequency") +
theme_minimal() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
## Warning: Removed 2978 rows containing non-finite values (`stat_bin()`).
# Calculate missing values
missing_values_summary <- HipKneeClean %>%
summarise(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeClean)) * 100)
# Print the table using kable
missing_values_summary %>%
kable(caption = "Table 7. Missing Values Summary") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | Missing_Count | Missing_Percentage |
|---|---|---|
| FacilityId | 0 | 0.000000 |
| ExcessReadmissionRatio_HIP-KNEE | 2978 | 61.835548 |
| PredictedReadmissionRate_HIP_KNEE | 2978 | 61.835548 |
| ExpectedReadmissionRate_HIP-KNEE | 2978 | 61.835548 |
| NumberOfReadmissions_HIP-KNEE | 2978 | 61.835548 |
| PatientSurveyStarRating_H_COMP_1_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_COMP_2_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_COMP_3_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_COMP_5_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_COMP_6_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_COMP_7_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_CLEAN_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_QUIET_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_HSP_RATING_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_RECMND_STAR_RATING | 1561 | 32.412791 |
| PatientSurveyStarRating_H_STAR_RATING | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | 1561 | 32.412791 |
| HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE | 1561 | 32.412791 |
| EDV | 972 | 20.182724 |
| ED_2_Strata_1 | 3709 | 77.014120 |
| HCP_COVID_19 | 1183 | 24.563954 |
| IMM_3 | 676 | 14.036545 |
| OP_18b | 749 | 15.552326 |
| OP_18c | 1718 | 35.672758 |
| OP_22 | 975 | 20.245017 |
| OP_23 | 3281 | 68.127076 |
| OP_29 | 1986 | 41.237541 |
| SAFE_USE_OF_OPIOIDS | 1146 | 23.795681 |
| SEP_1 | 1719 | 35.693522 |
| SEP_SH_3HR | 2196 | 45.598007 |
| SEP_SH_6HR | 2777 | 57.661960 |
| SEV_SEP_3HR | 1730 | 35.921927 |
| SEV_SEP_6HR | 1879 | 39.015781 |
| STK_02 | 3279 | 68.085548 |
| STK_05 | 3722 | 77.284053 |
| STK_06 | 3518 | 73.048173 |
| VTE_1 | 2600 | 53.986711 |
| VTE_2 | 3403 | 70.660299 |
| Score_COMP_HIP_KNEE | 2726 | 56.602990 |
| Score_MORT_30_AMI | 2873 | 59.655316 |
| Score_MORT_30_COPD | 2247 | 46.656977 |
| Score_MORT_30_HF | 1760 | 36.544851 |
| Score_MORT_30_PN | 1302 | 27.034884 |
| Score_MORT_30_STK | 2693 | 55.917774 |
| Score_PSI_03 | 1647 | 34.198505 |
| Score_PSI_04 | 3207 | 66.590532 |
| Score_PSI_06 | 1628 | 33.803987 |
| Score_PSI_08 | 1627 | 33.783223 |
| Score_PSI_09 | 1886 | 39.161130 |
| Score_PSI_10 | 2223 | 46.158638 |
| Score_PSI_11 | 2213 | 45.950997 |
| Score_PSI_12 | 1881 | 39.057309 |
| Score_PSI_13 | 2267 | 47.072259 |
| Score_PSI_14 | 2224 | 46.179402 |
| Score_PSI_15 | 1900 | 39.451827 |
| Score_PSI_90 | 1805 | 37.479236 |
| FacilityName | 171 | 3.550664 |
| State | 171 | 3.550664 |
| Payment_PAYM_90_HIP_KNEE | 2815 | 58.450997 |
# Compute correlation matrix
cor_matrix <- cor(HipKneeClean %>% select_if(is.numeric), use = "pairwise.complete.obs")
# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)
# Plot the heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), name = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")
# Convert the correlation matrix to a data frame
cor_table <- as.data.frame(cor_matrix)
# Add variable names as a column for better readability
cor_table$Variable <- rownames(cor_table)
# Reorder columns for better readability
cor_table <- cor_table %>%
select(Variable, everything())
# Print the table using kable
cor_table %>%
kable(caption = "Table 8. Correlation Coefficients Table") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | ExcessReadmissionRatio_HIP-KNEE | PredictedReadmissionRate_HIP_KNEE | ExpectedReadmissionRate_HIP-KNEE | NumberOfReadmissions_HIP-KNEE | PatientSurveyStarRating_H_COMP_1_STAR_RATING | PatientSurveyStarRating_H_COMP_2_STAR_RATING | PatientSurveyStarRating_H_COMP_3_STAR_RATING | PatientSurveyStarRating_H_COMP_5_STAR_RATING | PatientSurveyStarRating_H_COMP_6_STAR_RATING | PatientSurveyStarRating_H_COMP_7_STAR_RATING | PatientSurveyStarRating_H_CLEAN_STAR_RATING | PatientSurveyStarRating_H_QUIET_STAR_RATING | PatientSurveyStarRating_H_HSP_RATING_STAR_RATING | PatientSurveyStarRating_H_RECMND_STAR_RATING | PatientSurveyStarRating_H_STAR_RATING | HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE | HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE | HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE | HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE | ED_2_Strata_1 | HCP_COVID_19 | IMM_3 | OP_18b | OP_18c | OP_22 | OP_23 | OP_29 | SAFE_USE_OF_OPIOIDS | SEP_1 | SEP_SH_3HR | SEP_SH_6HR | SEV_SEP_3HR | SEV_SEP_6HR | STK_02 | STK_05 | STK_06 | VTE_1 | VTE_2 | Score_COMP_HIP_KNEE | Score_MORT_30_AMI | Score_MORT_30_COPD | Score_MORT_30_HF | Score_MORT_30_PN | Score_MORT_30_STK | Score_PSI_03 | Score_PSI_04 | Score_PSI_06 | Score_PSI_08 | Score_PSI_09 | Score_PSI_10 | Score_PSI_11 | Score_PSI_12 | Score_PSI_13 | Score_PSI_14 | Score_PSI_15 | Score_PSI_90 | Payment_PAYM_90_HIP_KNEE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ExcessReadmissionRatio_HIP-KNEE | ExcessReadmissionRatio_HIP-KNEE | 1.0000000 | 0.6851738 | 0.0934639 | 0.0280292 | -0.1590972 | -0.1749847 | -0.1597408 | -0.1551935 | -0.1771868 | -0.1941803 | -0.1160735 | -0.1047116 | -0.1709175 | -0.1659007 | -0.1783759 | -0.1666746 | -0.1802262 | -0.1831908 | -0.1775098 | -0.1844376 | -0.1877798 | -0.1223047 | -0.1123794 | -0.1779322 | -0.1852873 | 0.0762327 | -0.0508152 | -0.0583326 | 0.0505288 | 0.0838395 | 0.0252830 | 0.0485500 | -0.0296026 | 0.0627005 | -0.0309171 | -0.0529663 | 0.0313734 | -0.0322822 | -0.0278551 | -0.0434539 | -0.0205857 | -0.0425563 | 0.0039309 | 0.0707936 | 0.4350513 | 0.0392653 | -0.0452304 | -0.0577013 | -0.0046351 | -0.0025521 | -0.0058883 | 0.0129772 | 0.0156305 | 0.0386634 | -0.0459321 | 0.0099239 | 0.1103031 | 0.0939499 | 0.1262199 | -0.0138581 | -0.0019014 | 0.0882354 | 0.2740999 |
| PredictedReadmissionRate_HIP_KNEE | PredictedReadmissionRate_HIP_KNEE | 0.6851738 | 1.0000000 | 0.7840403 | -0.0298799 | -0.2144148 | -0.2287902 | -0.2138329 | -0.2191170 | -0.2029456 | -0.2247172 | -0.1981594 | -0.1748506 | -0.2016141 | -0.1642097 | -0.2308937 | -0.2067264 | -0.2352248 | -0.2449572 | -0.2546502 | -0.2079982 | -0.2250397 | -0.2045779 | -0.1801665 | -0.2060912 | -0.1911254 | 0.1082272 | -0.0563082 | -0.0028840 | 0.1295727 | 0.0877808 | 0.0501727 | 0.0614226 | -0.0106510 | 0.1063002 | -0.0326387 | -0.0689850 | 0.0231777 | -0.0375511 | -0.0014846 | 0.0066866 | -0.0801055 | -0.0101158 | 0.0654668 | 0.1064994 | 0.3208550 | 0.0074065 | -0.0794948 | -0.1067828 | -0.0985660 | -0.0376746 | -0.0037334 | -0.0449077 | 0.0154891 | -0.0214412 | -0.0182303 | 0.0710046 | 0.1130121 | 0.1047402 | 0.1193336 | 0.0140012 | -0.0158282 | 0.0973882 | 0.2975679 |
| ExpectedReadmissionRate_HIP-KNEE | ExpectedReadmissionRate_HIP-KNEE | 0.0934639 | 0.7840403 | 1.0000000 | -0.0742574 | -0.1696704 | -0.1755397 | -0.1644998 | -0.1788949 | -0.1353119 | -0.1533717 | -0.1829249 | -0.1621036 | -0.1408372 | -0.0923845 | -0.1744075 | -0.1524571 | -0.1798454 | -0.1926175 | -0.2093859 | -0.1348924 | -0.1608376 | -0.1848471 | -0.1649195 | -0.1425272 | -0.1150026 | 0.0843876 | -0.0316503 | 0.0435410 | 0.1403298 | 0.0491515 | 0.0510729 | 0.0366382 | 0.0098417 | 0.0946020 | -0.0189720 | -0.0503969 | 0.0048063 | -0.0227213 | 0.0212143 | 0.0415201 | -0.0867129 | 0.0212012 | 0.0866767 | 0.0781368 | 0.0742579 | -0.0287661 | -0.0683850 | -0.0967183 | -0.1319780 | -0.0478311 | 0.0027232 | -0.0749480 | 0.0017920 | -0.0595834 | 0.0136214 | 0.0894074 | 0.0639191 | 0.0654692 | 0.0603447 | 0.0328153 | -0.0242059 | 0.0626347 | 0.1808580 |
| NumberOfReadmissions_HIP-KNEE | NumberOfReadmissions_HIP-KNEE | 0.0280292 | -0.0298799 | -0.0742574 | 1.0000000 | 0.0798078 | 0.0740972 | 0.0227895 | 0.0398242 | 0.0599664 | 0.1063912 | 0.0109790 | 0.0177193 | 0.1181224 | 0.1391159 | 0.0779986 | 0.0692166 | 0.0716989 | 0.0290342 | 0.0495607 | 0.0542992 | 0.1106351 | 0.0187289 | 0.0248142 | 0.1116353 | 0.1516789 | 0.0425263 | 0.0450149 | 0.0255192 | 0.0895220 | 0.0668517 | 0.0478896 | -0.0553207 | -0.0126112 | 0.0749623 | 0.0004693 | -0.0353431 | 0.0408255 | 0.0074947 | 0.0166986 | 0.0343487 | -0.0087498 | 0.0347316 | 0.0586463 | 0.0734882 | -0.1517362 | -0.1650703 | -0.0843669 | -0.1375901 | -0.1231721 | -0.1345921 | -0.0339593 | -0.0798121 | -0.0499866 | -0.1384081 | -0.0209899 | -0.0728092 | -0.0689026 | -0.0448292 | -0.0832889 | -0.0662610 | -0.0665941 | -0.0942117 | -0.1234255 |
| PatientSurveyStarRating_H_COMP_1_STAR_RATING | PatientSurveyStarRating_H_COMP_1_STAR_RATING | -0.1590972 | -0.2144148 | -0.1696704 | 0.0798078 | 1.0000000 | 0.7652622 | 0.8092146 | 0.7817407 | 0.6947694 | 0.8094552 | 0.5857397 | 0.6176477 | 0.8113195 | 0.7410800 | 0.8821074 | 0.9413741 | 0.7947199 | 0.8439662 | 0.8192660 | 0.7212981 | 0.8279021 | 0.6062342 | 0.6396807 | 0.8361084 | 0.7761632 | -0.3811431 | -0.0217341 | 0.1991033 | -0.3200784 | -0.1816118 | -0.2152685 | 0.0440556 | 0.0700383 | 0.1401327 | 0.1288469 | 0.0438649 | -0.0683424 | 0.1642083 | 0.1503950 | 0.2176041 | 0.1753654 | 0.1870897 | -0.0859359 | -0.0029321 | -0.0761301 | -0.0479940 | -0.0124762 | 0.1078488 | 0.0183801 | -0.0132887 | -0.0219430 | -0.0320167 | -0.0184090 | 0.0046992 | 0.0835767 | -0.0402520 | -0.1459187 | -0.0669148 | -0.1424419 | 0.0017770 | 0.0329127 | -0.1122411 | -0.2121841 |
| PatientSurveyStarRating_H_COMP_2_STAR_RATING | PatientSurveyStarRating_H_COMP_2_STAR_RATING | -0.1749847 | -0.2287902 | -0.1755397 | 0.0740972 | 0.7652622 | 1.0000000 | 0.6820394 | 0.7310740 | 0.6334942 | 0.7745158 | 0.4946512 | 0.5958859 | 0.7576397 | 0.7012869 | 0.8135436 | 0.7932584 | 0.9499147 | 0.7139678 | 0.7675074 | 0.6648163 | 0.8027744 | 0.5120334 | 0.6187305 | 0.7826816 | 0.7365285 | -0.2976284 | 0.0206259 | 0.2149395 | -0.2473453 | -0.1452767 | -0.1462753 | 0.0003840 | 0.0721589 | 0.0759631 | 0.0920878 | 0.0418658 | -0.0697479 | 0.1295914 | 0.0899079 | 0.1940882 | 0.1763581 | 0.1579604 | -0.0887946 | -0.0231504 | -0.0615070 | -0.0650654 | -0.0163367 | 0.0722015 | -0.0021513 | -0.0165143 | -0.0016914 | -0.0041080 | 0.0301520 | -0.0171646 | 0.0956403 | -0.0267891 | -0.1367919 | -0.0390690 | -0.1430591 | -0.0079719 | 0.0359419 | -0.0871039 | -0.1893123 |
| PatientSurveyStarRating_H_COMP_3_STAR_RATING | PatientSurveyStarRating_H_COMP_3_STAR_RATING | -0.1597408 | -0.2138329 | -0.1644998 | 0.0227895 | 0.8092146 | 0.6820394 | 1.0000000 | 0.7583026 | 0.6569534 | 0.7355634 | 0.5956464 | 0.6138696 | 0.7478412 | 0.6650712 | 0.8260329 | 0.8314522 | 0.7052054 | 0.9423558 | 0.7878299 | 0.6812130 | 0.7546488 | 0.6155860 | 0.6319284 | 0.7774308 | 0.6961148 | -0.3896187 | -0.0747053 | 0.1498815 | -0.3981251 | -0.2241446 | -0.2360570 | 0.0486530 | 0.0507759 | 0.0900327 | 0.1321630 | 0.0521192 | -0.0725481 | 0.1605381 | 0.1368127 | 0.1201354 | 0.1413712 | 0.0905514 | -0.0996751 | -0.0477865 | -0.0405313 | -0.0177776 | 0.0302290 | 0.1568119 | 0.0384967 | 0.0435642 | -0.0192053 | -0.0016542 | 0.0274588 | 0.0159553 | 0.0859168 | -0.0262995 | -0.1292050 | -0.0865925 | -0.1330731 | 0.0092188 | 0.0480143 | -0.1032756 | -0.1564699 |
| PatientSurveyStarRating_H_COMP_5_STAR_RATING | PatientSurveyStarRating_H_COMP_5_STAR_RATING | -0.1551935 | -0.2191170 | -0.1788949 | 0.0398242 | 0.7817407 | 0.7310740 | 0.7583026 | 1.0000000 | 0.6659220 | 0.7694318 | 0.5769327 | 0.5922228 | 0.7502954 | 0.6685334 | 0.8320006 | 0.8038535 | 0.7587947 | 0.7931167 | 0.9410632 | 0.6945471 | 0.7903879 | 0.5987567 | 0.6129310 | 0.7793734 | 0.7066986 | -0.3575914 | -0.0062821 | 0.1693415 | -0.3234673 | -0.1918556 | -0.1939137 | 0.0438526 | 0.0664748 | 0.0784938 | 0.1324347 | 0.0587805 | -0.0562912 | 0.1606626 | 0.1423359 | 0.1358181 | 0.1964279 | 0.1045092 | -0.0999738 | -0.0151359 | -0.0450744 | -0.0510480 | -0.0130589 | 0.0644025 | -0.0046993 | 0.0257903 | -0.0221085 | -0.0185639 | 0.0273930 | -0.0011887 | 0.0897240 | -0.0322016 | -0.1375039 | -0.0537461 | -0.1257130 | -0.0064032 | 0.0443475 | -0.1023631 | -0.1576660 |
| PatientSurveyStarRating_H_COMP_6_STAR_RATING | PatientSurveyStarRating_H_COMP_6_STAR_RATING | -0.1771868 | -0.2029456 | -0.1353119 | 0.0599664 | 0.6947694 | 0.6334942 | 0.6569534 | 0.6659220 | 1.0000000 | 0.7186423 | 0.4784757 | 0.4292218 | 0.6769019 | 0.6409371 | 0.7586740 | 0.7386381 | 0.6584544 | 0.6847212 | 0.7015421 | 0.9400388 | 0.7549683 | 0.4999730 | 0.4567781 | 0.7098506 | 0.6770439 | -0.3008610 | 0.0354071 | 0.2404893 | -0.2071783 | -0.1351116 | -0.1225568 | 0.0625152 | 0.1045421 | 0.1184473 | 0.1443955 | 0.0557260 | -0.0637431 | 0.1782560 | 0.1758781 | 0.2091327 | 0.2034132 | 0.1930126 | 0.0235381 | 0.0288319 | -0.0990420 | -0.0682603 | 0.0034724 | 0.1194712 | -0.0343594 | 0.0159051 | 0.0012755 | 0.0047573 | 0.0312923 | -0.0120380 | 0.0883277 | -0.0190535 | -0.1484276 | -0.0612852 | -0.1424055 | -0.0023314 | 0.0662158 | -0.0911960 | -0.2089917 |
| PatientSurveyStarRating_H_COMP_7_STAR_RATING | PatientSurveyStarRating_H_COMP_7_STAR_RATING | -0.1941803 | -0.2247172 | -0.1533717 | 0.1063912 | 0.8094552 | 0.7745158 | 0.7355634 | 0.7694318 | 0.7186423 | 1.0000000 | 0.5720636 | 0.6101555 | 0.8272215 | 0.7939928 | 0.8743494 | 0.8277780 | 0.7995932 | 0.7605870 | 0.8011319 | 0.7433629 | 0.9482189 | 0.5929605 | 0.6371231 | 0.8571093 | 0.8310741 | -0.3553994 | 0.0397427 | 0.2374707 | -0.2572208 | -0.1626711 | -0.2149719 | 0.0359131 | 0.0858822 | 0.1201701 | 0.1393844 | 0.0474753 | -0.0250525 | 0.1732976 | 0.1422983 | 0.2157600 | 0.1380616 | 0.1802407 | -0.0364693 | 0.0393084 | -0.1067242 | -0.1098730 | -0.0673445 | 0.0151872 | -0.0880905 | -0.0653713 | -0.0300329 | -0.0817158 | -0.0009316 | -0.0348072 | 0.0827573 | -0.0399586 | -0.1668264 | -0.0670446 | -0.1593474 | -0.0164980 | 0.0357574 | -0.1311288 | -0.1977109 |
| PatientSurveyStarRating_H_CLEAN_STAR_RATING | PatientSurveyStarRating_H_CLEAN_STAR_RATING | -0.1160735 | -0.1981594 | -0.1829249 | 0.0109790 | 0.5857397 | 0.4946512 | 0.5956464 | 0.5769327 | 0.4784757 | 0.5720636 | 1.0000000 | 0.4987457 | 0.5965227 | 0.5237951 | 0.6391671 | 0.5928460 | 0.5111389 | 0.6220272 | 0.5982789 | 0.4927072 | 0.5781200 | 0.9570846 | 0.5105668 | 0.6248221 | 0.5505967 | -0.3195008 | -0.0225176 | 0.0814169 | -0.3267508 | -0.1804160 | -0.2479612 | 0.0190913 | 0.0137026 | 0.0598688 | 0.1690161 | 0.0858056 | -0.0192334 | 0.1869822 | 0.1376097 | 0.0367533 | 0.0909053 | 0.0153703 | -0.0848392 | -0.0027352 | -0.0571772 | -0.0630079 | -0.0352455 | 0.0665845 | 0.0009844 | -0.0651509 | -0.0629368 | -0.1272277 | -0.0390292 | 0.0016051 | -0.0069280 | -0.0851994 | -0.1286917 | -0.0807406 | -0.1231640 | -0.0459005 | -0.0007875 | -0.1463724 | -0.0409475 |
| PatientSurveyStarRating_H_QUIET_STAR_RATING | PatientSurveyStarRating_H_QUIET_STAR_RATING | -0.1047116 | -0.1748506 | -0.1621036 | 0.0177193 | 0.6176477 | 0.5958859 | 0.6138696 | 0.5922228 | 0.4292218 | 0.6101555 | 0.4987457 | 1.0000000 | 0.6313199 | 0.5470896 | 0.6730863 | 0.6317832 | 0.6199984 | 0.6414418 | 0.6171349 | 0.4395790 | 0.6249789 | 0.5123475 | 0.9556614 | 0.6537481 | 0.5755335 | -0.3375393 | -0.1460561 | 0.0866767 | -0.3615058 | -0.1742767 | -0.2095914 | 0.0018524 | -0.0044671 | 0.0303764 | 0.0919836 | 0.0246699 | -0.0291912 | 0.1003528 | 0.1010108 | 0.0765341 | 0.1069590 | 0.0440800 | -0.0897193 | -0.0754020 | -0.0250746 | 0.0428372 | 0.0722857 | 0.1510853 | 0.0902457 | 0.0337813 | -0.0457618 | -0.0343542 | -0.0062045 | 0.0093687 | 0.0220979 | -0.0051357 | -0.0772394 | -0.0730231 | -0.1033157 | -0.0277664 | -0.0213211 | -0.0977954 | -0.0612456 |
| PatientSurveyStarRating_H_HSP_RATING_STAR_RATING | PatientSurveyStarRating_H_HSP_RATING_STAR_RATING | -0.1709175 | -0.2016141 | -0.1408372 | 0.1181224 | 0.8113195 | 0.7576397 | 0.7478412 | 0.7502954 | 0.6769019 | 0.8272215 | 0.5965227 | 0.6313199 | 1.0000000 | 0.8595636 | 0.8714961 | 0.8454781 | 0.7811879 | 0.7821354 | 0.7928741 | 0.7081668 | 0.8548308 | 0.6252851 | 0.6598602 | 0.9428150 | 0.9030319 | -0.3410053 | 0.0325646 | 0.2088800 | -0.2358015 | -0.1677838 | -0.2095003 | 0.0163640 | 0.0773879 | 0.1105834 | 0.1692958 | 0.0877507 | 0.0093791 | 0.1897215 | 0.1544748 | 0.2160677 | 0.1589966 | 0.2063053 | -0.0348377 | 0.0691782 | -0.0967251 | -0.0951559 | -0.0338034 | 0.0240132 | -0.0613852 | -0.0814552 | -0.0490180 | -0.0852573 | -0.0166476 | -0.0686735 | 0.0710883 | -0.0270735 | -0.1585705 | -0.0643414 | -0.1472927 | -0.0205437 | 0.0287970 | -0.1387904 | -0.2018848 |
| PatientSurveyStarRating_H_RECMND_STAR_RATING | PatientSurveyStarRating_H_RECMND_STAR_RATING | -0.1659007 | -0.1642097 | -0.0923845 | 0.1391159 | 0.7410800 | 0.7012869 | 0.6650712 | 0.6685334 | 0.6409371 | 0.7939928 | 0.5237951 | 0.5470896 | 0.8595636 | 1.0000000 | 0.7960527 | 0.7806696 | 0.7300827 | 0.6882451 | 0.7037121 | 0.6761773 | 0.8247493 | 0.5530915 | 0.5738169 | 0.9052988 | 0.9480759 | -0.2979673 | 0.0850940 | 0.2212547 | -0.1390380 | -0.1405352 | -0.1692993 | -0.0078546 | 0.0885267 | 0.1190673 | 0.1532363 | 0.0735792 | 0.0265116 | 0.1760486 | 0.1358664 | 0.2348521 | 0.1378362 | 0.2020601 | 0.0037855 | 0.0981665 | -0.1246042 | -0.0981817 | -0.0266626 | -0.0005694 | -0.1107731 | -0.1033645 | -0.0387585 | -0.0857042 | -0.0221487 | -0.0800058 | 0.0787681 | 0.0075443 | -0.1576927 | -0.0393015 | -0.1297623 | 0.0134380 | 0.0262619 | -0.1202753 | -0.2235957 |
| PatientSurveyStarRating_H_STAR_RATING | PatientSurveyStarRating_H_STAR_RATING | -0.1783759 | -0.2308937 | -0.1744075 | 0.0779986 | 0.8821074 | 0.8135436 | 0.8260329 | 0.8320006 | 0.7586740 | 0.8743494 | 0.6391671 | 0.6730863 | 0.8714961 | 0.7960527 | 1.0000000 | 0.8920184 | 0.8314222 | 0.8501602 | 0.8592370 | 0.7769593 | 0.8837377 | 0.6599688 | 0.6959526 | 0.8852220 | 0.8308877 | -0.3910783 | 0.0049679 | 0.2156832 | -0.3210235 | -0.2011786 | -0.2229136 | 0.0373120 | 0.0727680 | 0.1107188 | 0.1500361 | 0.0637937 | -0.0494590 | 0.1800467 | 0.1591593 | 0.1860190 | 0.1945060 | 0.1646616 | -0.0552836 | 0.0051809 | -0.0822439 | -0.0710775 | -0.0009117 | 0.0947303 | -0.0145406 | -0.0275058 | -0.0283146 | -0.0547617 | 0.0061083 | -0.0156145 | 0.0913137 | -0.0452172 | -0.1540619 | -0.0738343 | -0.1614933 | -0.0003415 | 0.0303266 | -0.1263561 | -0.1981220 |
| HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE | -0.1666746 | -0.2067264 | -0.1524571 | 0.0692166 | 0.9413741 | 0.7932584 | 0.8314522 | 0.8038535 | 0.7386381 | 0.8277780 | 0.5928460 | 0.6317832 | 0.8454781 | 0.7806696 | 0.8920184 | 1.0000000 | 0.8343982 | 0.8821498 | 0.8499295 | 0.7853207 | 0.8750498 | 0.6213823 | 0.6671546 | 0.8901557 | 0.8322887 | -0.3639238 | -0.0217481 | 0.2169674 | -0.3107282 | -0.1745392 | -0.2104045 | 0.0422069 | 0.0800014 | 0.1588429 | 0.1603505 | 0.0495956 | -0.0522724 | 0.1983566 | 0.1847906 | 0.2367758 | 0.2012691 | 0.2088467 | -0.0544304 | 0.0173084 | -0.0796721 | -0.0452192 | -0.0003548 | 0.1228267 | 0.0156651 | -0.0170135 | -0.0236955 | -0.0303324 | -0.0019546 | -0.0080622 | 0.0845264 | -0.0270956 | -0.1428877 | -0.0663743 | -0.1459851 | -0.0029199 | 0.0385911 | -0.1096342 | -0.2107792 |
| HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE | -0.1802262 | -0.2352248 | -0.1798454 | 0.0716989 | 0.7947199 | 0.9499147 | 0.7052054 | 0.7587947 | 0.6584544 | 0.7995932 | 0.5111389 | 0.6199984 | 0.7811879 | 0.7300827 | 0.8314222 | 0.8343982 | 1.0000000 | 0.7459134 | 0.8003061 | 0.6977774 | 0.8394911 | 0.5323986 | 0.6495609 | 0.8179081 | 0.7709408 | -0.3129324 | 0.0155923 | 0.2128270 | -0.2565866 | -0.1409829 | -0.1463552 | 0.0002792 | 0.0782762 | 0.0802578 | 0.1011092 | 0.0415646 | -0.0658150 | 0.1433814 | 0.0975333 | 0.1984598 | 0.1920243 | 0.1718119 | -0.0972506 | -0.0314520 | -0.0680571 | -0.0565823 | -0.0027657 | 0.0845920 | 0.0121242 | 0.0058744 | 0.0019106 | -0.0063422 | 0.0334780 | -0.0028885 | 0.0956884 | -0.0236698 | -0.1414366 | -0.0380979 | -0.1394736 | -0.0143285 | 0.0425664 | -0.0826332 | -0.1875503 |
| HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE | -0.1831908 | -0.2449572 | -0.1926175 | 0.0290342 | 0.8439662 | 0.7139678 | 0.9423558 | 0.7931167 | 0.6847212 | 0.7605870 | 0.6220272 | 0.6414418 | 0.7821354 | 0.6882451 | 0.8501602 | 0.8821498 | 0.7459134 | 1.0000000 | 0.8363904 | 0.7154682 | 0.7981273 | 0.6483447 | 0.6695835 | 0.8221836 | 0.7369761 | -0.3677548 | -0.0905698 | 0.1363775 | -0.4107053 | -0.2259603 | -0.2585483 | 0.0412981 | 0.0395669 | 0.0998162 | 0.1511649 | 0.0576566 | -0.0681526 | 0.1772872 | 0.1643799 | 0.1340445 | 0.1572709 | 0.1065507 | -0.1148325 | -0.0405641 | -0.0605862 | -0.0215130 | 0.0433887 | 0.1663036 | 0.0511012 | 0.0476323 | -0.0223315 | -0.0117228 | 0.0152153 | 0.0187552 | 0.0790559 | -0.0330533 | -0.1408040 | -0.0953948 | -0.1388091 | -0.0063199 | 0.0405341 | -0.1131291 | -0.1799323 |
| HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE | -0.1775098 | -0.2546502 | -0.2093859 | 0.0495607 | 0.8192660 | 0.7675074 | 0.7878299 | 0.9410632 | 0.7015421 | 0.8011319 | 0.5982789 | 0.6171349 | 0.7928741 | 0.7037121 | 0.8592370 | 0.8499295 | 0.8003061 | 0.8363904 | 1.0000000 | 0.7405838 | 0.8369260 | 0.6236211 | 0.6436063 | 0.8239488 | 0.7494949 | -0.3259347 | -0.0050286 | 0.1728899 | -0.3312654 | -0.1928398 | -0.1986827 | 0.0259892 | 0.0644512 | 0.0816216 | 0.1501584 | 0.0726417 | -0.0510706 | 0.1823598 | 0.1625916 | 0.1647114 | 0.2084062 | 0.1450104 | -0.1112139 | -0.0255467 | -0.0557335 | -0.0641464 | -0.0091794 | 0.0704411 | -0.0036037 | 0.0323273 | -0.0128473 | -0.0052418 | 0.0247052 | -0.0039459 | 0.0793646 | -0.0452983 | -0.1557561 | -0.0685223 | -0.1386420 | -0.0063967 | 0.0420468 | -0.1086247 | -0.1660056 |
| HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE | -0.1844376 | -0.2079982 | -0.1348924 | 0.0542992 | 0.7212981 | 0.6648163 | 0.6812130 | 0.6945471 | 0.9400388 | 0.7433629 | 0.4927072 | 0.4395790 | 0.7081668 | 0.6761773 | 0.7769593 | 0.7853207 | 0.6977774 | 0.7154682 | 0.7405838 | 1.0000000 | 0.7968806 | 0.5182906 | 0.4744589 | 0.7556492 | 0.7238004 | -0.2900150 | 0.0369924 | 0.2564050 | -0.2066316 | -0.1376094 | -0.1292560 | 0.0701172 | 0.1220570 | 0.1274513 | 0.1720468 | 0.0650279 | -0.0549999 | 0.2122738 | 0.2084118 | 0.2236306 | 0.2431733 | 0.2738285 | 0.0383041 | 0.0559072 | -0.1078990 | -0.0733482 | 0.0081764 | 0.1281267 | -0.0315360 | 0.0086356 | 0.0049870 | 0.0228481 | 0.0256121 | -0.0176853 | 0.0859133 | -0.0145875 | -0.1539039 | -0.0536226 | -0.1430776 | 0.0069610 | 0.0649921 | -0.0872152 | -0.2310210 |
| HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE | HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE | -0.1877798 | -0.2250397 | -0.1608376 | 0.1106351 | 0.8279021 | 0.8027744 | 0.7546488 | 0.7903879 | 0.7549683 | 0.9482189 | 0.5781200 | 0.6249789 | 0.8548308 | 0.8247493 | 0.8837377 | 0.8750498 | 0.8394911 | 0.7981273 | 0.8369260 | 0.7968806 | 1.0000000 | 0.6025716 | 0.6612420 | 0.9026621 | 0.8770384 | -0.3383196 | 0.0347599 | 0.2352166 | -0.2492489 | -0.1534787 | -0.2043225 | 0.0287005 | 0.0923400 | 0.1255794 | 0.1636239 | 0.0630912 | -0.0194129 | 0.2014907 | 0.1664733 | 0.2341396 | 0.1543470 | 0.2096414 | -0.0268317 | 0.0534239 | -0.1051209 | -0.1115171 | -0.0514076 | 0.0259241 | -0.0829950 | -0.0564195 | -0.0211932 | -0.0638071 | 0.0098029 | -0.0381375 | 0.0825062 | -0.0316508 | -0.1629600 | -0.0614646 | -0.1636005 | -0.0189605 | 0.0320446 | -0.1202132 | -0.2069970 |
| HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE | HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE | -0.1223047 | -0.2045779 | -0.1848471 | 0.0187289 | 0.6062342 | 0.5120334 | 0.6155860 | 0.5987567 | 0.4999730 | 0.5929605 | 0.9570846 | 0.5123475 | 0.6252851 | 0.5530915 | 0.6599688 | 0.6213823 | 0.5323986 | 0.6483447 | 0.6236211 | 0.5182906 | 0.6025716 | 1.0000000 | 0.5250997 | 0.6604779 | 0.5844106 | -0.3337655 | -0.0182565 | 0.0860619 | -0.3285392 | -0.1777808 | -0.2595566 | 0.0253586 | 0.0135852 | 0.0584827 | 0.1855312 | 0.1034169 | -0.0099818 | 0.2012726 | 0.1514801 | 0.0455756 | 0.1000540 | 0.0284015 | -0.0880614 | -0.0051760 | -0.0539829 | -0.0624749 | -0.0255442 | 0.0677998 | -0.0009274 | -0.0795298 | -0.0709592 | -0.1182074 | -0.0369162 | 0.0027541 | -0.0109302 | -0.0753999 | -0.1233582 | -0.0813748 | -0.1261494 | -0.0505286 | 0.0061946 | -0.1486299 | -0.0439537 |
| HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE | HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE | -0.1123794 | -0.1801665 | -0.1649195 | 0.0248142 | 0.6396807 | 0.6187305 | 0.6319284 | 0.6129310 | 0.4567781 | 0.6371231 | 0.5105668 | 0.9556614 | 0.6598602 | 0.5738169 | 0.6959526 | 0.6671546 | 0.6495609 | 0.6695835 | 0.6436063 | 0.4744589 | 0.6612420 | 0.5250997 | 1.0000000 | 0.6896579 | 0.6107149 | -0.3437480 | -0.1442006 | 0.1064024 | -0.3661649 | -0.1699262 | -0.2145544 | -0.0022965 | 0.0082056 | 0.0494650 | 0.0908135 | 0.0158871 | -0.0303120 | 0.1023846 | 0.1022135 | 0.0817038 | 0.1098322 | 0.0437557 | -0.0886019 | -0.0634120 | -0.0341816 | 0.0366437 | 0.0773892 | 0.1553155 | 0.0950135 | 0.0255069 | -0.0454692 | -0.0450913 | -0.0062861 | 0.0051695 | 0.0262747 | -0.0054371 | -0.0789391 | -0.0813685 | -0.1096072 | -0.0256155 | -0.0196124 | -0.1003157 | -0.0653730 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | -0.1779322 | -0.2060912 | -0.1425272 | 0.1116353 | 0.8361084 | 0.7826816 | 0.7774308 | 0.7793734 | 0.7098506 | 0.8571093 | 0.6248221 | 0.6537481 | 0.9428150 | 0.9052988 | 0.8852220 | 0.8901557 | 0.8179081 | 0.8221836 | 0.8239488 | 0.7556492 | 0.9026621 | 0.6604779 | 0.6896579 | 1.0000000 | 0.9580767 | -0.3521196 | 0.0302154 | 0.2182505 | -0.2448842 | -0.1752428 | -0.2290270 | 0.0077229 | 0.0865028 | 0.1100002 | 0.1826272 | 0.0844502 | 0.0072643 | 0.2115650 | 0.1691943 | 0.2211843 | 0.1738457 | 0.2315523 | -0.0248375 | 0.0775236 | -0.1091775 | -0.0952111 | -0.0230632 | 0.0300940 | -0.0702915 | -0.0760295 | -0.0499063 | -0.0948401 | -0.0054742 | -0.0590998 | 0.0707774 | -0.0157615 | -0.1615833 | -0.0674588 | -0.1458284 | -0.0172231 | 0.0304634 | -0.1390570 | -0.2108956 |
| HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE | HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE | -0.1852873 | -0.1911254 | -0.1150026 | 0.1516789 | 0.7761632 | 0.7365285 | 0.6961148 | 0.7066986 | 0.6770439 | 0.8310741 | 0.5505967 | 0.5755335 | 0.9030319 | 0.9480759 | 0.8308877 | 0.8322887 | 0.7709408 | 0.7369761 | 0.7494949 | 0.7238004 | 0.8770384 | 0.5844106 | 0.6107149 | 0.9580767 | 1.0000000 | -0.2951331 | 0.0831560 | 0.2197432 | -0.1585545 | -0.1471003 | -0.1980854 | -0.0179814 | 0.0967347 | 0.1229267 | 0.1676529 | 0.0687394 | 0.0292287 | 0.1933820 | 0.1501468 | 0.2439658 | 0.1443141 | 0.2442083 | 0.0058311 | 0.1184210 | -0.1332048 | -0.1095716 | -0.0262870 | -0.0003806 | -0.1100101 | -0.1074875 | -0.0363290 | -0.1008569 | -0.0193730 | -0.0826240 | 0.0862428 | -0.0105068 | -0.1710237 | -0.0510987 | -0.1364494 | 0.0003983 | 0.0331751 | -0.1279752 | -0.2364653 |
| ED_2_Strata_1 | ED_2_Strata_1 | 0.0762327 | 0.1082272 | 0.0843876 | 0.0425263 | -0.3811431 | -0.2976284 | -0.3896187 | -0.3575914 | -0.3008610 | -0.3553994 | -0.3195008 | -0.3375393 | -0.3410053 | -0.2979673 | -0.3910783 | -0.3639238 | -0.3129324 | -0.3677548 | -0.3259347 | -0.2900150 | -0.3383196 | -0.3337655 | -0.3437480 | -0.3521196 | -0.2951331 | 1.0000000 | 0.1248128 | 0.0186099 | 0.5775206 | 0.4204676 | 0.3419958 | -0.0684821 | 0.0164159 | -0.0486040 | -0.1152627 | -0.0615764 | -0.0354024 | -0.1265965 | -0.0558505 | -0.0983294 | 0.0254940 | -0.0925059 | 0.0761139 | 0.0321584 | 0.0673396 | 0.0579101 | -0.0537002 | -0.1136600 | -0.0389080 | 0.0005912 | 0.0602674 | -0.0541503 | 0.0551291 | -0.0211509 | -0.0336614 | 0.0315830 | 0.1430203 | 0.1038617 | 0.0817412 | 0.0633107 | -0.0393782 | 0.1286484 | 0.1212071 |
| HCP_COVID_19 | HCP_COVID_19 | -0.0508152 | -0.0563082 | -0.0316503 | 0.0450149 | -0.0217341 | 0.0206259 | -0.0747053 | -0.0062821 | 0.0354071 | 0.0397427 | -0.0225176 | -0.1460561 | 0.0325646 | 0.0850940 | 0.0049679 | -0.0217481 | 0.0155923 | -0.0905698 | -0.0050286 | 0.0369924 | 0.0347599 | -0.0182565 | -0.1442006 | 0.0302154 | 0.0831560 | 0.1248128 | 1.0000000 | 0.3203622 | 0.2574291 | 0.0698819 | 0.1122982 | -0.0306155 | 0.1067941 | -0.0812735 | -0.0345104 | 0.0310392 | -0.0124470 | -0.0175650 | -0.1149175 | 0.0947908 | 0.0304334 | 0.0831624 | 0.0241622 | -0.0151698 | -0.0510683 | -0.0869890 | -0.1128278 | -0.1245435 | -0.1523779 | -0.0988833 | 0.0943953 | 0.0417007 | 0.0225916 | -0.0272232 | 0.0549990 | 0.0009222 | -0.0909811 | 0.1091949 | -0.0160408 | 0.0169512 | 0.0430812 | 0.0534106 | -0.0627505 |
| IMM_3 | IMM_3 | -0.0583326 | -0.0028840 | 0.0435410 | 0.0255192 | 0.1991033 | 0.2149395 | 0.1498815 | 0.1693415 | 0.2404893 | 0.2374707 | 0.0814169 | 0.0866767 | 0.2088800 | 0.2212547 | 0.2156832 | 0.2169674 | 0.2128270 | 0.1363775 | 0.1728899 | 0.2564050 | 0.2352166 | 0.0860619 | 0.1064024 | 0.2182505 | 0.2197432 | 0.0186099 | 0.3203622 | 1.0000000 | 0.1105628 | 0.0343661 | 0.0560372 | 0.0400235 | 0.1317922 | 0.0410289 | 0.0439297 | 0.0519538 | -0.0201631 | 0.0361783 | 0.0484998 | 0.1318005 | 0.1020708 | 0.0900367 | 0.0906329 | 0.0058144 | -0.0212916 | -0.0165321 | -0.0616051 | -0.0010634 | -0.0761104 | 0.0146397 | 0.0451508 | 0.0601831 | 0.0544576 | -0.0311579 | 0.0899361 | 0.0412713 | -0.0625676 | 0.0594933 | -0.0250714 | 0.0723872 | 0.0625753 | 0.0226639 | -0.0720431 |
| OP_18b | OP_18b | 0.0505288 | 0.1295727 | 0.1403298 | 0.0895220 | -0.3200784 | -0.2473453 | -0.3981251 | -0.3234673 | -0.2071783 | -0.2572208 | -0.3267508 | -0.3615058 | -0.2358015 | -0.1390380 | -0.3210235 | -0.3107282 | -0.2565866 | -0.4107053 | -0.3312654 | -0.2066316 | -0.2492489 | -0.3285392 | -0.3661649 | -0.2448842 | -0.1585545 | 0.5775206 | 0.2574291 | 0.1105628 | 1.0000000 | 0.4959758 | 0.5894838 | -0.0756249 | 0.0506067 | -0.1400845 | -0.1714513 | -0.0557159 | 0.0160417 | -0.2007893 | -0.1593730 | 0.0723310 | -0.1190028 | 0.0948584 | 0.2344307 | 0.0629824 | -0.0293698 | -0.0678837 | -0.1527290 | -0.2195933 | -0.1858291 | -0.0905644 | 0.0583644 | 0.0638412 | 0.0187794 | -0.0806516 | 0.0224833 | 0.0544528 | -0.0076797 | 0.1653812 | 0.0993570 | 0.0676972 | 0.0374530 | 0.0986077 | -0.0241965 |
| OP_18c | OP_18c | 0.0838395 | 0.0877808 | 0.0491515 | 0.0668517 | -0.1816118 | -0.1452767 | -0.2241446 | -0.1918556 | -0.1351116 | -0.1626711 | -0.1804160 | -0.1742767 | -0.1677838 | -0.1405352 | -0.2011786 | -0.1745392 | -0.1409829 | -0.2259603 | -0.1928398 | -0.1376094 | -0.1534787 | -0.1777808 | -0.1699262 | -0.1752428 | -0.1471003 | 0.4204676 | 0.0698819 | 0.0343661 | 0.4959758 | 1.0000000 | 0.3393524 | 0.0090382 | 0.0430774 | -0.0573485 | -0.0726374 | -0.0430698 | 0.0018369 | -0.0844129 | -0.0481731 | -0.0076033 | -0.0476864 | 0.0158359 | 0.1235925 | 0.0408574 | 0.0028501 | -0.0296718 | -0.0977614 | -0.1375348 | -0.0578145 | -0.0463395 | 0.0181970 | 0.0046272 | 0.0392571 | -0.0457654 | -0.0128057 | 0.0031479 | 0.0074122 | 0.0486693 | 0.0559662 | 0.0192234 | 0.0049833 | 0.0401196 | 0.0264108 |
| OP_22 | OP_22 | 0.0252830 | 0.0501727 | 0.0510729 | 0.0478896 | -0.2152685 | -0.1462753 | -0.2360570 | -0.1939137 | -0.1225568 | -0.2149719 | -0.2479612 | -0.2095914 | -0.2095003 | -0.1692993 | -0.2229136 | -0.2104045 | -0.1463552 | -0.2585483 | -0.1986827 | -0.1292560 | -0.2043225 | -0.2595566 | -0.2145544 | -0.2290270 | -0.1980854 | 0.3419958 | 0.1122982 | 0.0560372 | 0.5894838 | 0.3393524 | 1.0000000 | -0.0949210 | 0.0293870 | -0.1014242 | -0.2178259 | -0.1066885 | -0.0990718 | -0.2279495 | -0.1444805 | -0.0066518 | -0.0291353 | 0.0057888 | 0.0805708 | -0.0317460 | 0.0214909 | 0.0376811 | -0.0446228 | -0.0767863 | -0.0559254 | 0.0198257 | 0.0648778 | 0.1019307 | 0.0452860 | 0.0102293 | 0.0426036 | 0.0343100 | 0.0397075 | 0.0778917 | 0.0451515 | 0.0652307 | 0.0094012 | 0.0926476 | -0.0237710 |
| OP_23 | OP_23 | 0.0485500 | 0.0614226 | 0.0366382 | -0.0553207 | 0.0440556 | 0.0003840 | 0.0486530 | 0.0438526 | 0.0625152 | 0.0359131 | 0.0190913 | 0.0018524 | 0.0163640 | -0.0078546 | 0.0373120 | 0.0422069 | 0.0002792 | 0.0412981 | 0.0259892 | 0.0701172 | 0.0287005 | 0.0253586 | -0.0022965 | 0.0077229 | -0.0179814 | -0.0684821 | -0.0306155 | 0.0400235 | -0.0756249 | 0.0090382 | -0.0949210 | 1.0000000 | 0.0732988 | 0.0371036 | 0.1919253 | 0.1198314 | 0.0629739 | 0.1834131 | 0.1440580 | 0.0821585 | 0.1352184 | 0.0401311 | 0.2485615 | 0.1664553 | 0.0398944 | 0.0034495 | 0.0135325 | 0.0045056 | -0.0210473 | -0.0614632 | -0.0570477 | -0.0326986 | -0.0515403 | 0.0269822 | -0.0140788 | -0.0045339 | 0.0279871 | -0.0606628 | -0.0384416 | -0.0593032 | -0.0270588 | -0.0530033 | 0.0545394 |
| OP_29 | OP_29 | -0.0296026 | -0.0106510 | 0.0098417 | -0.0126112 | 0.0700383 | 0.0721589 | 0.0507759 | 0.0664748 | 0.1045421 | 0.0858822 | 0.0137026 | -0.0044671 | 0.0773879 | 0.0885267 | 0.0727680 | 0.0800014 | 0.0782762 | 0.0395669 | 0.0644512 | 0.1220570 | 0.0923400 | 0.0135852 | 0.0082056 | 0.0865028 | 0.0967347 | 0.0164159 | 0.1067941 | 0.1317922 | 0.0506067 | 0.0430774 | 0.0293870 | 0.0732988 | 1.0000000 | -0.0650231 | 0.0952846 | 0.0685334 | 0.0455411 | 0.1111026 | 0.0304354 | 0.0653732 | 0.0263917 | 0.0285800 | 0.1567526 | 0.0271825 | -0.0096464 | -0.0600569 | 0.0081252 | 0.0099705 | -0.0536184 | -0.0251654 | -0.0032584 | 0.0312780 | 0.0059688 | -0.0199006 | 0.0150699 | 0.0331569 | -0.0837910 | -0.0108546 | -0.0087678 | 0.0163779 | 0.0419068 | -0.0312757 | -0.0815209 |
| SAFE_USE_OF_OPIOIDS | SAFE_USE_OF_OPIOIDS | 0.0627005 | 0.1063002 | 0.0946020 | 0.0749623 | 0.1401327 | 0.0759631 | 0.0900327 | 0.0784938 | 0.1184473 | 0.1201701 | 0.0598688 | 0.0303764 | 0.1105834 | 0.1190673 | 0.1107188 | 0.1588429 | 0.0802578 | 0.0998162 | 0.0816216 | 0.1274513 | 0.1255794 | 0.0584827 | 0.0494650 | 0.1100002 | 0.1229267 | -0.0486040 | -0.0812735 | 0.0410289 | -0.1400845 | -0.0573485 | -0.1014242 | 0.0371036 | -0.0650231 | 1.0000000 | 0.0650110 | 0.0100193 | 0.0697614 | 0.0854069 | 0.0857646 | 0.1450732 | 0.0858291 | 0.1347354 | -0.0563373 | 0.1913961 | -0.0081923 | -0.0643353 | -0.0362573 | 0.0171605 | -0.0204107 | -0.0804707 | -0.0287158 | -0.0995591 | -0.0603629 | -0.0017558 | -0.0043396 | -0.0382369 | 0.0137283 | -0.0380145 | -0.0258097 | -0.0166146 | -0.0268805 | -0.0300979 | -0.0048449 |
| SEP_1 | SEP_1 | -0.0309171 | -0.0326387 | -0.0189720 | 0.0004693 | 0.1288469 | 0.0920878 | 0.1321630 | 0.1324347 | 0.1443955 | 0.1393844 | 0.1690161 | 0.0919836 | 0.1692958 | 0.1532363 | 0.1500361 | 0.1603505 | 0.1011092 | 0.1511649 | 0.1501584 | 0.1720468 | 0.1636239 | 0.1855312 | 0.0908135 | 0.1826272 | 0.1676529 | -0.1152627 | -0.0345104 | 0.0439297 | -0.1714513 | -0.0726374 | -0.2178259 | 0.1919253 | 0.0952846 | 0.0650110 | 1.0000000 | 0.7309445 | 0.5744973 | 0.8329106 | 0.6460143 | 0.0976908 | 0.0923975 | 0.1245526 | 0.2303117 | 0.2029424 | -0.0347567 | -0.0154393 | 0.0197865 | 0.0170609 | -0.0335808 | -0.0769699 | -0.0797702 | -0.0937443 | -0.0339321 | -0.0179990 | -0.0245784 | -0.0390884 | -0.0564170 | -0.0757690 | -0.0254596 | -0.0644595 | -0.0051105 | -0.1034135 | -0.0078309 |
| SEP_SH_3HR | SEP_SH_3HR | -0.0529663 | -0.0689850 | -0.0503969 | -0.0353431 | 0.0438649 | 0.0418658 | 0.0521192 | 0.0587805 | 0.0557260 | 0.0474753 | 0.0858056 | 0.0246699 | 0.0877507 | 0.0735792 | 0.0637937 | 0.0495956 | 0.0415646 | 0.0576566 | 0.0726417 | 0.0650279 | 0.0630912 | 0.1034169 | 0.0158871 | 0.0844502 | 0.0687394 | -0.0615764 | 0.0310392 | 0.0519538 | -0.0557159 | -0.0430698 | -0.1066885 | 0.1198314 | 0.0685334 | 0.0100193 | 0.7309445 | 1.0000000 | 0.3894182 | 0.5124276 | 0.3029233 | 0.0214458 | -0.0003039 | 0.0083388 | 0.0443562 | 0.0597759 | -0.0068112 | 0.0143010 | 0.0121194 | 0.0396301 | 0.0312695 | -0.0311801 | -0.0131473 | 0.0329404 | 0.0208887 | -0.0022962 | 0.0293689 | -0.0176156 | -0.0367514 | -0.0072258 | 0.0030403 | -0.0148982 | 0.0376698 | -0.0243705 | 0.0044851 |
| SEP_SH_6HR | SEP_SH_6HR | 0.0313734 | 0.0231777 | 0.0048063 | 0.0408255 | -0.0683424 | -0.0697479 | -0.0725481 | -0.0562912 | -0.0637431 | -0.0250525 | -0.0192334 | -0.0291912 | 0.0093791 | 0.0265116 | -0.0494590 | -0.0522724 | -0.0658150 | -0.0681526 | -0.0510706 | -0.0549999 | -0.0194129 | -0.0099818 | -0.0303120 | 0.0072643 | 0.0292287 | -0.0354024 | -0.0124470 | -0.0201631 | 0.0160417 | 0.0018369 | -0.0990718 | 0.0629739 | 0.0455411 | 0.0697614 | 0.5744973 | 0.3894182 | 1.0000000 | 0.3849719 | 0.2338399 | -0.0161607 | -0.0334702 | -0.0156808 | 0.1359384 | 0.1422354 | -0.0041075 | -0.0631275 | -0.0279555 | -0.0515598 | -0.0787945 | -0.0854897 | -0.0659226 | -0.0727333 | -0.0738364 | -0.0368684 | -0.0345921 | 0.0185638 | -0.0310002 | -0.0467304 | -0.0030434 | -0.0228048 | 0.0032891 | -0.0713756 | 0.0072647 |
| SEV_SEP_3HR | SEV_SEP_3HR | -0.0322822 | -0.0375511 | -0.0227213 | 0.0074947 | 0.1642083 | 0.1295914 | 0.1605381 | 0.1606626 | 0.1782560 | 0.1732976 | 0.1869822 | 0.1003528 | 0.1897215 | 0.1760486 | 0.1800467 | 0.1983566 | 0.1433814 | 0.1772872 | 0.1823598 | 0.2122738 | 0.2014907 | 0.2012726 | 0.1023846 | 0.2115650 | 0.1933820 | -0.1265965 | -0.0175650 | 0.0361783 | -0.2007893 | -0.0844129 | -0.2279495 | 0.1834131 | 0.1111026 | 0.0854069 | 0.8329106 | 0.5124276 | 0.3849719 | 1.0000000 | 0.4694083 | 0.1935342 | 0.1476882 | 0.2318112 | 0.2111767 | 0.2098349 | -0.0375621 | -0.0091369 | 0.0251530 | 0.0425397 | -0.0246311 | -0.0663090 | -0.0624234 | -0.0956269 | -0.0378140 | -0.0141213 | -0.0257851 | -0.0434036 | -0.0699820 | -0.0642500 | -0.0485933 | -0.0700502 | -0.0047427 | -0.0993958 | -0.0154257 |
| SEV_SEP_6HR | SEV_SEP_6HR | -0.0278551 | -0.0014846 | 0.0212143 | 0.0166986 | 0.1503950 | 0.0899079 | 0.1368127 | 0.1423359 | 0.1758781 | 0.1422983 | 0.1376097 | 0.1010108 | 0.1544748 | 0.1358664 | 0.1591593 | 0.1847906 | 0.0975333 | 0.1643799 | 0.1625916 | 0.2084118 | 0.1664733 | 0.1514801 | 0.1022135 | 0.1691943 | 0.1501468 | -0.0558505 | -0.1149175 | 0.0484998 | -0.1593730 | -0.0481731 | -0.1444805 | 0.1440580 | 0.0304354 | 0.0857646 | 0.6460143 | 0.3029233 | 0.2338399 | 0.4694083 | 1.0000000 | 0.0037384 | 0.0649257 | 0.0631441 | 0.2522518 | 0.2083065 | -0.0109117 | -0.0072140 | 0.0299786 | 0.0287674 | -0.0152374 | -0.0457369 | -0.0572546 | -0.0824732 | -0.0074196 | -0.0351047 | -0.0587674 | -0.0265019 | -0.0400763 | -0.0886155 | -0.0494623 | -0.0377753 | -0.0179532 | -0.0875069 | -0.0092648 |
| STK_02 | STK_02 | -0.0434539 | 0.0066866 | 0.0415201 | 0.0343487 | 0.2176041 | 0.1940882 | 0.1201354 | 0.1358181 | 0.2091327 | 0.2157600 | 0.0367533 | 0.0765341 | 0.2160677 | 0.2348521 | 0.1860190 | 0.2367758 | 0.1984598 | 0.1340445 | 0.1647114 | 0.2236306 | 0.2341396 | 0.0455756 | 0.0817038 | 0.2211843 | 0.2439658 | -0.0983294 | 0.0947908 | 0.1318005 | 0.0723310 | -0.0076033 | -0.0066518 | 0.0821585 | 0.0653732 | 0.1450732 | 0.0976908 | 0.0214458 | -0.0161607 | 0.1935342 | 0.0037384 | 1.0000000 | 0.3266488 | 0.8021725 | 0.4631419 | 0.4148180 | -0.0319671 | -0.0518587 | -0.0534076 | 0.0093937 | -0.0939059 | -0.0905922 | -0.0044494 | -0.0746093 | 0.0129386 | -0.0292526 | 0.0219699 | -0.0138235 | -0.0675289 | 0.0183344 | -0.0210595 | 0.0392025 | -0.0065909 | -0.0294786 | -0.1317922 |
| STK_05 | STK_05 | -0.0205857 | -0.0801055 | -0.0867129 | -0.0087498 | 0.1753654 | 0.1763581 | 0.1413712 | 0.1964279 | 0.2034132 | 0.1380616 | 0.0909053 | 0.1069590 | 0.1589966 | 0.1378362 | 0.1945060 | 0.2012691 | 0.1920243 | 0.1572709 | 0.2084062 | 0.2431733 | 0.1543470 | 0.1000540 | 0.1098322 | 0.1738457 | 0.1443141 | 0.0254940 | 0.0304334 | 0.1020708 | -0.1190028 | -0.0476864 | -0.0291353 | 0.1352184 | 0.0263917 | 0.0858291 | 0.0923975 | -0.0003039 | -0.0334702 | 0.1476882 | 0.0649257 | 0.3266488 | 1.0000000 | 0.2500620 | 0.5880079 | 0.6976272 | 0.0512645 | 0.0331009 | -0.0155739 | 0.1007144 | -0.0037918 | -0.0246261 | 0.0132861 | -0.0270607 | -0.0072465 | 0.0776413 | 0.0066697 | -0.0467887 | -0.0263715 | -0.0426109 | -0.0270503 | 0.0071556 | 0.0285160 | -0.0154089 | -0.0328044 |
| STK_06 | STK_06 | -0.0425563 | -0.0101158 | 0.0212012 | 0.0347316 | 0.1870897 | 0.1579604 | 0.0905514 | 0.1045092 | 0.1930126 | 0.1802407 | 0.0153703 | 0.0440800 | 0.2063053 | 0.2020601 | 0.1646616 | 0.2088467 | 0.1718119 | 0.1065507 | 0.1450104 | 0.2738285 | 0.2096414 | 0.0284015 | 0.0437557 | 0.2315523 | 0.2442083 | -0.0925059 | 0.0831624 | 0.0900367 | 0.0948584 | 0.0158359 | 0.0057888 | 0.0401311 | 0.0285800 | 0.1347354 | 0.1245526 | 0.0083388 | -0.0156808 | 0.2318112 | 0.0631441 | 0.8021725 | 0.2500620 | 1.0000000 | 0.4774227 | 0.5132722 | -0.0207810 | -0.0497188 | -0.0477381 | -0.0086007 | -0.0990850 | -0.0787235 | -0.0203047 | -0.0479955 | -0.0035153 | -0.0646189 | 0.0144757 | -0.0298356 | -0.0454564 | 0.0343065 | -0.0293121 | 0.0542351 | 0.0085622 | -0.0284310 | -0.1296356 |
| VTE_1 | VTE_1 | 0.0039309 | 0.0654668 | 0.0866767 | 0.0586463 | -0.0859359 | -0.0887946 | -0.0996751 | -0.0999738 | 0.0235381 | -0.0364693 | -0.0848392 | -0.0897193 | -0.0348377 | 0.0037855 | -0.0552836 | -0.0544304 | -0.0972506 | -0.1148325 | -0.1112139 | 0.0383041 | -0.0268317 | -0.0880614 | -0.0886019 | -0.0248375 | 0.0058311 | 0.0761139 | 0.0241622 | 0.0906329 | 0.2344307 | 0.1235925 | 0.0805708 | 0.2485615 | 0.1567526 | -0.0563373 | 0.2303117 | 0.0443562 | 0.1359384 | 0.2111767 | 0.2522518 | 0.4631419 | 0.5880079 | 0.4774227 | 1.0000000 | 0.8736490 | -0.0526925 | -0.0493931 | -0.0282911 | -0.1051171 | -0.1710235 | -0.1238916 | -0.0378500 | -0.1180160 | -0.0262166 | -0.0522876 | -0.0577213 | -0.0037803 | -0.0301775 | -0.0317534 | -0.0440578 | 0.0086141 | 0.0326658 | -0.0475558 | -0.1256363 |
| VTE_2 | VTE_2 | 0.0707936 | 0.1064994 | 0.0781368 | 0.0734882 | -0.0029321 | -0.0231504 | -0.0477865 | -0.0151359 | 0.0288319 | 0.0393084 | -0.0027352 | -0.0754020 | 0.0691782 | 0.0981665 | 0.0051809 | 0.0173084 | -0.0314520 | -0.0405641 | -0.0255467 | 0.0559072 | 0.0534239 | -0.0051760 | -0.0634120 | 0.0775236 | 0.1184210 | 0.0321584 | -0.0151698 | 0.0058144 | 0.0629824 | 0.0408574 | -0.0317460 | 0.1664553 | 0.0271825 | 0.1913961 | 0.2029424 | 0.0597759 | 0.1422354 | 0.2098349 | 0.2083065 | 0.4148180 | 0.6976272 | 0.5132722 | 0.8736490 | 1.0000000 | -0.0073599 | -0.0995881 | -0.1188383 | -0.1492708 | -0.1770482 | -0.1181448 | -0.0713006 | -0.1762469 | -0.0469516 | -0.1120007 | -0.0691325 | 0.0036343 | -0.0247178 | -0.0036828 | 0.0009109 | 0.0110118 | -0.0279567 | -0.0666219 | 0.0217206 |
| Score_COMP_HIP_KNEE | Score_COMP_HIP_KNEE | 0.4350513 | 0.3208550 | 0.0742579 | -0.1517362 | -0.0761301 | -0.0615070 | -0.0405313 | -0.0450744 | -0.0990420 | -0.1067242 | -0.0571772 | -0.0250746 | -0.0967251 | -0.1246042 | -0.0822439 | -0.0796721 | -0.0680571 | -0.0605862 | -0.0557335 | -0.1078990 | -0.1051209 | -0.0539829 | -0.0341816 | -0.1091775 | -0.1332048 | 0.0673396 | -0.0510683 | -0.0212916 | -0.0293698 | 0.0028501 | 0.0214909 | 0.0398944 | -0.0096464 | -0.0081923 | -0.0347567 | -0.0068112 | -0.0041075 | -0.0375621 | -0.0109117 | -0.0319671 | 0.0512645 | -0.0207810 | -0.0526925 | -0.0073599 | 1.0000000 | 0.0830479 | -0.0203930 | -0.0007242 | 0.0241066 | 0.0211621 | 0.0498557 | 0.0038509 | 0.0505415 | 0.0577776 | 0.0540124 | 0.0813038 | 0.1279724 | 0.1458258 | 0.1334619 | 0.0498603 | 0.0433809 | 0.1604802 | 0.3410864 |
| Score_MORT_30_AMI | Score_MORT_30_AMI | 0.0392653 | 0.0074065 | -0.0287661 | -0.1650703 | -0.0479940 | -0.0650654 | -0.0177776 | -0.0510480 | -0.0682603 | -0.1098730 | -0.0630079 | 0.0428372 | -0.0951559 | -0.0981817 | -0.0710775 | -0.0452192 | -0.0565823 | -0.0215130 | -0.0641464 | -0.0733482 | -0.1115171 | -0.0624749 | 0.0366437 | -0.0952111 | -0.1095716 | 0.0579101 | -0.0869890 | -0.0165321 | -0.0678837 | -0.0296718 | 0.0376811 | 0.0034495 | -0.0600569 | -0.0643353 | -0.0154393 | 0.0143010 | -0.0631275 | -0.0091369 | -0.0072140 | -0.0518587 | 0.0331009 | -0.0497188 | -0.0493931 | -0.0995881 | 0.0830479 | 1.0000000 | 0.2498600 | 0.3407616 | 0.3309425 | 0.2222539 | 0.0415523 | 0.2105379 | 0.0885083 | 0.1010348 | 0.0889343 | 0.1066619 | 0.1037006 | 0.0492328 | 0.0467554 | 0.0454462 | 0.0297688 | 0.1129695 | 0.0591548 |
| Score_MORT_30_COPD | Score_MORT_30_COPD | -0.0452304 | -0.0794948 | -0.0683850 | -0.0843669 | -0.0124762 | -0.0163367 | 0.0302290 | -0.0130589 | 0.0034724 | -0.0673445 | -0.0352455 | 0.0722857 | -0.0338034 | -0.0266626 | -0.0009117 | -0.0003548 | -0.0027657 | 0.0433887 | -0.0091794 | 0.0081764 | -0.0514076 | -0.0255442 | 0.0773892 | -0.0230632 | -0.0262870 | -0.0537002 | -0.1128278 | -0.0616051 | -0.1527290 | -0.0977614 | -0.0446228 | 0.0135325 | 0.0081252 | -0.0362573 | 0.0197865 | 0.0121194 | -0.0279555 | 0.0251530 | 0.0299786 | -0.0534076 | -0.0155739 | -0.0477381 | -0.0282911 | -0.1188383 | -0.0203930 | 0.2498600 | 1.0000000 | 0.3844105 | 0.3710744 | 0.2038243 | -0.0069743 | 0.1713379 | 0.0478268 | 0.0397571 | 0.0429090 | 0.0320669 | 0.0426574 | -0.0532586 | 0.0026944 | 0.0734846 | 0.0340007 | 0.0140214 | -0.0406696 |
| Score_MORT_30_HF | Score_MORT_30_HF | -0.0577013 | -0.1067828 | -0.0967183 | -0.1375901 | 0.1078488 | 0.0722015 | 0.1568119 | 0.0644025 | 0.1194712 | 0.0151872 | 0.0665845 | 0.1510853 | 0.0240132 | -0.0005694 | 0.0947303 | 0.1228267 | 0.0845920 | 0.1663036 | 0.0704411 | 0.1281267 | 0.0259241 | 0.0677998 | 0.1553155 | 0.0300940 | -0.0003806 | -0.1136600 | -0.1245435 | -0.0010634 | -0.2195933 | -0.1375348 | -0.0767863 | 0.0045056 | 0.0099705 | 0.0171605 | 0.0170609 | 0.0396301 | -0.0515598 | 0.0425397 | 0.0287674 | 0.0093937 | 0.1007144 | -0.0086007 | -0.1051171 | -0.1492708 | -0.0007242 | 0.3407616 | 0.3844105 | 1.0000000 | 0.4479367 | 0.3147371 | 0.0371596 | 0.2556384 | 0.0679149 | 0.1051698 | 0.0707269 | 0.0383771 | 0.0362529 | -0.0300702 | -0.0086832 | 0.0647245 | 0.0342374 | 0.0465081 | -0.0350247 |
| Score_MORT_30_PN | Score_MORT_30_PN | -0.0046351 | -0.0985660 | -0.1319780 | -0.1231721 | 0.0183801 | -0.0021513 | 0.0384967 | -0.0046993 | -0.0343594 | -0.0880905 | 0.0009844 | 0.0902457 | -0.0613852 | -0.1107731 | -0.0145406 | 0.0156651 | 0.0121242 | 0.0511012 | -0.0036037 | -0.0315360 | -0.0829950 | -0.0009274 | 0.0950135 | -0.0702915 | -0.1100101 | -0.0389080 | -0.1523779 | -0.0761104 | -0.1858291 | -0.0578145 | -0.0559254 | -0.0210473 | -0.0536184 | -0.0204107 | -0.0335808 | 0.0312695 | -0.0787945 | -0.0246311 | -0.0152374 | -0.0939059 | -0.0037918 | -0.0990850 | -0.1710235 | -0.1770482 | 0.0241066 | 0.3309425 | 0.3710744 | 0.4479367 | 1.0000000 | 0.3042563 | 0.0303815 | 0.2301195 | 0.0543554 | 0.0884315 | 0.0217880 | 0.0237048 | 0.0704445 | 0.0089560 | 0.0393676 | 0.0464407 | 0.0029691 | 0.0661595 | -0.0062985 |
| Score_MORT_30_STK | Score_MORT_30_STK | -0.0025521 | -0.0376746 | -0.0478311 | -0.1345921 | -0.0132887 | -0.0165143 | 0.0435642 | 0.0257903 | 0.0159051 | -0.0653713 | -0.0651509 | 0.0337813 | -0.0814552 | -0.1033645 | -0.0275058 | -0.0170135 | 0.0058744 | 0.0476323 | 0.0323273 | 0.0086356 | -0.0564195 | -0.0795298 | 0.0255069 | -0.0760295 | -0.1074875 | 0.0005912 | -0.0988833 | 0.0146397 | -0.0905644 | -0.0463395 | 0.0198257 | -0.0614632 | -0.0251654 | -0.0804707 | -0.0769699 | -0.0311801 | -0.0854897 | -0.0663090 | -0.0457369 | -0.0905922 | -0.0246261 | -0.0787235 | -0.1238916 | -0.1181448 | 0.0211621 | 0.2222539 | 0.2038243 | 0.3147371 | 0.3042563 | 1.0000000 | 0.0687216 | 0.2380935 | 0.0878847 | 0.1014879 | 0.0674377 | 0.0622532 | 0.0725381 | 0.0474896 | 0.0513975 | 0.0492194 | 0.0625191 | 0.1142992 | -0.0272101 |
| Score_PSI_03 | Score_PSI_03 | -0.0058883 | -0.0037334 | 0.0027232 | -0.0339593 | -0.0219430 | -0.0016914 | -0.0192053 | -0.0221085 | 0.0012755 | -0.0300329 | -0.0629368 | -0.0457618 | -0.0490180 | -0.0387585 | -0.0283146 | -0.0236955 | 0.0019106 | -0.0223315 | -0.0128473 | 0.0049870 | -0.0211932 | -0.0709592 | -0.0454692 | -0.0499063 | -0.0363290 | 0.0602674 | 0.0943953 | 0.0451508 | 0.0583644 | 0.0181970 | 0.0648778 | -0.0570477 | -0.0032584 | -0.0287158 | -0.0797702 | -0.0131473 | -0.0659226 | -0.0624234 | -0.0572546 | -0.0044494 | 0.0132861 | -0.0203047 | -0.0378500 | -0.0713006 | 0.0498557 | 0.0415523 | -0.0069743 | 0.0371596 | 0.0303815 | 0.0687216 | 1.0000000 | 0.1353085 | 0.0601750 | 0.0636661 | 0.1407342 | 0.0386211 | 0.0114365 | 0.1186788 | 0.0298580 | 0.0596798 | 0.0999683 | 0.7496827 | 0.0086745 |
| Score_PSI_04 | Score_PSI_04 | 0.0129772 | -0.0449077 | -0.0749480 | -0.0798121 | -0.0320167 | -0.0041080 | -0.0016542 | -0.0185639 | 0.0047573 | -0.0817158 | -0.1272277 | -0.0343542 | -0.0852573 | -0.0857042 | -0.0547617 | -0.0303324 | -0.0063422 | -0.0117228 | -0.0052418 | 0.0228481 | -0.0638071 | -0.1182074 | -0.0450913 | -0.0948401 | -0.1008569 | -0.0541503 | 0.0417007 | 0.0601831 | 0.0638412 | 0.0046272 | 0.1019307 | -0.0326986 | 0.0312780 | -0.0995591 | -0.0937443 | 0.0329404 | -0.0727333 | -0.0956269 | -0.0824732 | -0.0746093 | -0.0270607 | -0.0479955 | -0.1180160 | -0.1762469 | 0.0038509 | 0.2105379 | 0.1713379 | 0.2556384 | 0.2301195 | 0.2380935 | 0.1353085 | 1.0000000 | 0.0601419 | 0.0870693 | 0.1059485 | 0.0523892 | 0.0649032 | 0.0782559 | 0.0123489 | 0.0652098 | 0.1018205 | 0.1589978 | -0.0766302 |
| Score_PSI_06 | Score_PSI_06 | 0.0156305 | 0.0154891 | 0.0017920 | -0.0499866 | -0.0184090 | 0.0301520 | 0.0274588 | 0.0273930 | 0.0312923 | -0.0009316 | -0.0390292 | -0.0062045 | -0.0166476 | -0.0221487 | 0.0061083 | -0.0019546 | 0.0334780 | 0.0152153 | 0.0247052 | 0.0256121 | 0.0098029 | -0.0369162 | -0.0062861 | -0.0054742 | -0.0193730 | 0.0551291 | 0.0225916 | 0.0544576 | 0.0187794 | 0.0392571 | 0.0452860 | -0.0515403 | 0.0059688 | -0.0603629 | -0.0339321 | 0.0208887 | -0.0738364 | -0.0378140 | -0.0074196 | 0.0129386 | -0.0072465 | -0.0035153 | -0.0262166 | -0.0469516 | 0.0505415 | 0.0885083 | 0.0478268 | 0.0679149 | 0.0543554 | 0.0878847 | 0.0601750 | 0.0601419 | 1.0000000 | 0.0724291 | 0.1014588 | 0.0516246 | 0.0351464 | 0.1431056 | 0.0509831 | 0.0527115 | 0.0910520 | 0.1455340 | 0.0456525 |
| Score_PSI_08 | Score_PSI_08 | 0.0386634 | -0.0214412 | -0.0595834 | -0.1384081 | 0.0046992 | -0.0171646 | 0.0159553 | -0.0011887 | -0.0120380 | -0.0348072 | 0.0016051 | 0.0093687 | -0.0686735 | -0.0800058 | -0.0156145 | -0.0080622 | -0.0028885 | 0.0187552 | -0.0039459 | -0.0176853 | -0.0381375 | 0.0027541 | 0.0051695 | -0.0590998 | -0.0826240 | -0.0211509 | -0.0272232 | -0.0311579 | -0.0806516 | -0.0457654 | 0.0102293 | 0.0269822 | -0.0199006 | -0.0017558 | -0.0179990 | -0.0022962 | -0.0368684 | -0.0141213 | -0.0351047 | -0.0292526 | 0.0776413 | -0.0646189 | -0.0522876 | -0.1120007 | 0.0577776 | 0.1010348 | 0.0397571 | 0.1051698 | 0.0884315 | 0.1014879 | 0.0636661 | 0.0870693 | 0.0724291 | 1.0000000 | 0.0052449 | -0.0360093 | 0.0198090 | 0.0394605 | 0.0093444 | 0.0228045 | 0.0127268 | 0.0624052 | -0.0041983 |
| Score_PSI_09 | Score_PSI_09 | -0.0459321 | -0.0182303 | 0.0136214 | -0.0209899 | 0.0835767 | 0.0956403 | 0.0859168 | 0.0897240 | 0.0883277 | 0.0827573 | -0.0069280 | 0.0220979 | 0.0710883 | 0.0787681 | 0.0913137 | 0.0845264 | 0.0956884 | 0.0790559 | 0.0793646 | 0.0859133 | 0.0825062 | -0.0109302 | 0.0262747 | 0.0707774 | 0.0862428 | -0.0336614 | 0.0549990 | 0.0899361 | 0.0224833 | -0.0128057 | 0.0426036 | -0.0140788 | 0.0150699 | -0.0043396 | -0.0245784 | 0.0293689 | -0.0345921 | -0.0257851 | -0.0587674 | 0.0219699 | 0.0066697 | 0.0144757 | -0.0577213 | -0.0691325 | 0.0540124 | 0.0889343 | 0.0429090 | 0.0707269 | 0.0217880 | 0.0674377 | 0.1407342 | 0.1059485 | 0.1014588 | 0.0052449 | 1.0000000 | 0.0885278 | 0.0680540 | 0.1732337 | 0.0519119 | 0.1207438 | 0.2197254 | 0.2331017 | -0.0237660 |
| Score_PSI_10 | Score_PSI_10 | 0.0099239 | 0.0710046 | 0.0894074 | -0.0728092 | -0.0402520 | -0.0267891 | -0.0262995 | -0.0322016 | -0.0190535 | -0.0399586 | -0.0851994 | -0.0051357 | -0.0270735 | 0.0075443 | -0.0452172 | -0.0270956 | -0.0236698 | -0.0330533 | -0.0452983 | -0.0145875 | -0.0316508 | -0.0753999 | -0.0054371 | -0.0157615 | -0.0105068 | 0.0315830 | 0.0009222 | 0.0412713 | 0.0544528 | 0.0031479 | 0.0343100 | -0.0045339 | 0.0331569 | -0.0382369 | -0.0390884 | -0.0176156 | 0.0185638 | -0.0434036 | -0.0265019 | -0.0138235 | -0.0467887 | -0.0298356 | -0.0037803 | 0.0036343 | 0.0813038 | 0.1066619 | 0.0320669 | 0.0383771 | 0.0237048 | 0.0622532 | 0.0386211 | 0.0523892 | 0.0516246 | -0.0360093 | 0.0885278 | 1.0000000 | 0.1626632 | 0.1079488 | 0.2303938 | 0.0453739 | 0.0830134 | 0.2670390 | 0.0497447 |
| Score_PSI_11 | Score_PSI_11 | 0.1103031 | 0.1130121 | 0.0639191 | -0.0689026 | -0.1459187 | -0.1367919 | -0.1292050 | -0.1375039 | -0.1484276 | -0.1668264 | -0.1286917 | -0.0772394 | -0.1585705 | -0.1576927 | -0.1540619 | -0.1428877 | -0.1414366 | -0.1408040 | -0.1557561 | -0.1539039 | -0.1629600 | -0.1233582 | -0.0789391 | -0.1615833 | -0.1710237 | 0.1430203 | -0.0909811 | -0.0625676 | -0.0076797 | 0.0074122 | 0.0397075 | 0.0279871 | -0.0837910 | 0.0137283 | -0.0564170 | -0.0367514 | -0.0310002 | -0.0699820 | -0.0400763 | -0.0675289 | -0.0263715 | -0.0454564 | -0.0301775 | -0.0247178 | 0.1279724 | 0.1037006 | 0.0426574 | 0.0362529 | 0.0704445 | 0.0725381 | 0.0114365 | 0.0649032 | 0.0351464 | 0.0198090 | 0.0680540 | 0.1626632 | 1.0000000 | 0.1172504 | 0.2506376 | -0.0093577 | 0.0464067 | 0.5858033 | 0.1441986 |
| Score_PSI_12 | Score_PSI_12 | 0.0939499 | 0.1047402 | 0.0654692 | -0.0448292 | -0.0669148 | -0.0390690 | -0.0865925 | -0.0537461 | -0.0612852 | -0.0670446 | -0.0807406 | -0.0730231 | -0.0643414 | -0.0393015 | -0.0738343 | -0.0663743 | -0.0380979 | -0.0953948 | -0.0685223 | -0.0536226 | -0.0614646 | -0.0813748 | -0.0813685 | -0.0674588 | -0.0510987 | 0.1038617 | 0.1091949 | 0.0594933 | 0.1653812 | 0.0486693 | 0.0778917 | -0.0606628 | -0.0108546 | -0.0380145 | -0.0757690 | -0.0072258 | -0.0467304 | -0.0642500 | -0.0886155 | 0.0183344 | -0.0426109 | 0.0343065 | -0.0317534 | -0.0036828 | 0.1458258 | 0.0492328 | -0.0532586 | -0.0300702 | 0.0089560 | 0.0474896 | 0.1186788 | 0.0782559 | 0.1431056 | 0.0394605 | 0.1732337 | 0.1079488 | 0.1172504 | 1.0000000 | 0.1742084 | 0.0522204 | 0.1358951 | 0.3821290 | 0.0655557 |
| Score_PSI_13 | Score_PSI_13 | 0.1262199 | 0.1193336 | 0.0603447 | -0.0832889 | -0.1424419 | -0.1430591 | -0.1330731 | -0.1257130 | -0.1424055 | -0.1593474 | -0.1231640 | -0.1033157 | -0.1472927 | -0.1297623 | -0.1614933 | -0.1459851 | -0.1394736 | -0.1388091 | -0.1386420 | -0.1430776 | -0.1636005 | -0.1261494 | -0.1096072 | -0.1458284 | -0.1364494 | 0.0817412 | -0.0160408 | -0.0250714 | 0.0993570 | 0.0559662 | 0.0451515 | -0.0384416 | -0.0087678 | -0.0258097 | -0.0254596 | 0.0030403 | -0.0030434 | -0.0485933 | -0.0494623 | -0.0210595 | -0.0270503 | -0.0293121 | -0.0440578 | 0.0009109 | 0.1334619 | 0.0467554 | 0.0026944 | -0.0086832 | 0.0393676 | 0.0513975 | 0.0298580 | 0.0123489 | 0.0509831 | 0.0093444 | 0.0519119 | 0.2303938 | 0.2506376 | 0.1742084 | 1.0000000 | 0.0056987 | 0.0878105 | 0.4075564 | 0.0949467 |
| Score_PSI_14 | Score_PSI_14 | -0.0138581 | 0.0140012 | 0.0328153 | -0.0662610 | 0.0017770 | -0.0079719 | 0.0092188 | -0.0064032 | -0.0023314 | -0.0164980 | -0.0459005 | -0.0277664 | -0.0205437 | 0.0134380 | -0.0003415 | -0.0029199 | -0.0143285 | -0.0063199 | -0.0063967 | 0.0069610 | -0.0189605 | -0.0505286 | -0.0256155 | -0.0172231 | 0.0003983 | 0.0633107 | 0.0169512 | 0.0723872 | 0.0676972 | 0.0192234 | 0.0652307 | -0.0593032 | 0.0163779 | -0.0166146 | -0.0644595 | -0.0148982 | -0.0228048 | -0.0700502 | -0.0377753 | 0.0392025 | 0.0071556 | 0.0542351 | 0.0086141 | 0.0110118 | 0.0498603 | 0.0454462 | 0.0734846 | 0.0647245 | 0.0464407 | 0.0492194 | 0.0596798 | 0.0652098 | 0.0527115 | 0.0228045 | 0.1207438 | 0.0453739 | -0.0093577 | 0.0522204 | 0.0056987 | 1.0000000 | 0.1176726 | 0.0783006 | -0.0181150 |
| Score_PSI_15 | Score_PSI_15 | -0.0019014 | -0.0158282 | -0.0242059 | -0.0665941 | 0.0329127 | 0.0359419 | 0.0480143 | 0.0443475 | 0.0662158 | 0.0357574 | -0.0007875 | -0.0213211 | 0.0287970 | 0.0262619 | 0.0303266 | 0.0385911 | 0.0425664 | 0.0405341 | 0.0420468 | 0.0649921 | 0.0320446 | 0.0061946 | -0.0196124 | 0.0304634 | 0.0331751 | -0.0393782 | 0.0430812 | 0.0625753 | 0.0374530 | 0.0049833 | 0.0094012 | -0.0270588 | 0.0419068 | -0.0268805 | -0.0051105 | 0.0376698 | 0.0032891 | -0.0047427 | -0.0179532 | -0.0065909 | 0.0285160 | 0.0085622 | 0.0326658 | -0.0279567 | 0.0433809 | 0.0297688 | 0.0340007 | 0.0342374 | 0.0029691 | 0.0625191 | 0.0999683 | 0.1018205 | 0.0910520 | 0.0127268 | 0.2197254 | 0.0830134 | 0.0464067 | 0.1358951 | 0.0878105 | 0.1176726 | 1.0000000 | 0.2021298 | -0.0467071 |
| Score_PSI_90 | Score_PSI_90 | 0.0882354 | 0.0973882 | 0.0626347 | -0.0942117 | -0.1122411 | -0.0871039 | -0.1032756 | -0.1023631 | -0.0911960 | -0.1311288 | -0.1463724 | -0.0977954 | -0.1387904 | -0.1202753 | -0.1263561 | -0.1096342 | -0.0826332 | -0.1131291 | -0.1086247 | -0.0872152 | -0.1202132 | -0.1486299 | -0.1003157 | -0.1390570 | -0.1279752 | 0.1286484 | 0.0534106 | 0.0226639 | 0.0986077 | 0.0401196 | 0.0926476 | -0.0530033 | -0.0312757 | -0.0300979 | -0.1034135 | -0.0243705 | -0.0713756 | -0.0993958 | -0.0875069 | -0.0294786 | -0.0154089 | -0.0284310 | -0.0475558 | -0.0666219 | 0.1604802 | 0.1129695 | 0.0140214 | 0.0465081 | 0.0661595 | 0.1142992 | 0.7496827 | 0.1589978 | 0.1455340 | 0.0624052 | 0.2331017 | 0.2670390 | 0.5858033 | 0.3821290 | 0.4075564 | 0.0783006 | 0.2021298 | 1.0000000 | 0.1036455 |
| Payment_PAYM_90_HIP_KNEE | Payment_PAYM_90_HIP_KNEE | 0.2740999 | 0.2975679 | 0.1808580 | -0.1234255 | -0.2121841 | -0.1893123 | -0.1564699 | -0.1576660 | -0.2089917 | -0.1977109 | -0.0409475 | -0.0612456 | -0.2018848 | -0.2235957 | -0.1981220 | -0.2107792 | -0.1875503 | -0.1799323 | -0.1660056 | -0.2310210 | -0.2069970 | -0.0439537 | -0.0653730 | -0.2108956 | -0.2364653 | 0.1212071 | -0.0627505 | -0.0720431 | -0.0241965 | 0.0264108 | -0.0237710 | 0.0545394 | -0.0815209 | -0.0048449 | -0.0078309 | 0.0044851 | 0.0072647 | -0.0154257 | -0.0092648 | -0.1317922 | -0.0328044 | -0.1296356 | -0.1256363 | 0.0217206 | 0.3410864 | 0.0591548 | -0.0406696 | -0.0350247 | -0.0062985 | -0.0272101 | 0.0086745 | -0.0766302 | 0.0456525 | -0.0041983 | -0.0237660 | 0.0497447 | 0.1441986 | 0.0655557 | 0.0949467 | -0.0181150 | -0.0467071 | 0.1036455 | 1.0000000 |
# Create function to find categorical variables
is_categorical <- function(x) is.factor(x) | is.character(x)
# Apply function to all variables in the dataset
categorical_vars <- sapply(HipKneeClean, is_categorical)
# Print the names of all categorical variables
categorical <- names(HipKneeClean)[categorical_vars]
categorical
## [1] "FacilityId" "EDV" "FacilityName" "State"
# Define the encoding mapping (ignore NAs for now)
encoding_map <- c(
'low' = 1,
'medium' = 2,
'high' = 3,
'very high' = 4
)
# Dummy encoding used due to ordinal nature of this data
# Create a copy of HipKneeClean and name it HipKneeTrain to separate cleaned dataset and the training dataset
HipKneeTrain <- HipKneeClean %>%
mutate(EDV = recode(EDV, !!!encoding_map))
# Print first 20 rows of EDV column in HipKneeClean and HipKneeTrain to ensure proper encoding
cat("HipKneeClean")
## HipKneeClean
print(head(HipKneeClean$EDV, 20))
## [1] "high" "high" "high" "low" "low" "high"
## [7] "low" "medium" "low" "medium" "low" "low"
## [13] "high" "high" "very high" "very high" "low" "high"
## [19] "low" "very high"
cat("HipKneeTrain")
## HipKneeTrain
print(head(HipKneeTrain$EDV, 20))
## [1] 3 3 3 1 1 3 1 2 1 2 1 1 3 3 4 4 1 3 1 4
# Manually map out each state with their respective code in alphabetical order with a preceding 0 to make data non-ordinal
state_mapping <- c(
"AL" = "001",
"AK" = "002",
"AZ" = "003",
"AR" = "004",
"CA" = "005",
"CO" = "006",
"CT" = "007",
"DE" = "008",
"FL" = "009",
"GA" = "010",
"HI" = "011",
"ID" = "012",
"IL" = "013",
"IN" = "014",
"IA" = "015",
"KS" = "016",
"KY" = "017",
"LA" = "018",
"ME" = "019",
"MD" = "020",
"MA" = "021",
"MI" = "022",
"MN" = "023",
"MS" = "024",
"MO" = "025",
"MT" = "026",
"NE" = "027",
"NV" = "028",
"NH" = "029",
"NJ" = "030",
"NM" = "031",
"NY" = "032",
"NC" = "033",
"ND" = "034",
"OH" = "035",
"OK" = "036",
"OR" = "037",
"PA" = "038",
"RI" = "039",
"SC" = "040",
"SD" = "041",
"TN" = "042",
"TX" = "043",
"UT" = "044",
"VT" = "045",
"VA" = "046",
"WA" = "047",
"WV" = "048",
"WI" = "049",
"WY" = "050"
)
# Create new "StateCode" column with the encoded values
HipKneeTrain <- HipKneeTrain %>%
mutate(StateCode = state_mapping[State])
# Print 100 rows of the "State" and "StateCode" columns to ensure accuracy
print("State and StateCode Columns")
## [1] "State and StateCode Columns"
print(head(HipKneeTrain[c("State", "StateCode")], 100))
## # A tibble: 100 × 2
## State StateCode
## <chr> <chr>
## 1 AL 001
## 2 AL 001
## 3 AL 001
## 4 AL 001
## 5 AL 001
## 6 AL 001
## 7 AL 001
## 8 AL 001
## 9 AL 001
## 10 AL 001
## # ℹ 90 more rows
# Print all unique values in "StateCode" column to ensure accuracy
print("Unique StateCode Values")
## [1] "Unique StateCode Values"
print(unique(HipKneeTrain$StateCode))
## [1] "001" "002" "003" "004" "005" "006" "007" "008" NA "009" "010" "011"
## [13] "012" "013" "014" "015" "016" "017" "018" "019" "020" "021" "022" "023"
## [25] "024" "025" "026" "027" "028" "029" "030" "031" "032" "033" "034" "035"
## [37] "036" "037" "038" "039" "040" "041" "042" "043" "044" "045" "046" "047"
## [49] "048" "049" "050"
# Compute correlation matrix
cor_matrix <- cor(HipKneeTrain %>% select_if(is.numeric), use = "pairwise.complete.obs")
# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)
# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), name = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")
# Convert correlation matrix to df
cor_table <- as.data.frame(cor_matrix)
# Add variable names as a column
cor_table$Variable <- rownames(cor_table)
# Reorder columns
cor_table <- cor_table %>%
select(Variable, everything())
# Print table
cor_table %>%
kable(caption = "Table 8. Correlation Coefficients Table") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | PredictedReadmissionRate_HIP_KNEE | HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | EDV | HCP_COVID_19 | IMM_3 | OP_18b | OP_29 | SAFE_USE_OF_OPIOIDS | VTE_1 | Score_COMP_HIP_KNEE | Score_MORT_30_AMI | Score_MORT_30_COPD | Score_MORT_30_HF | Score_MORT_30_PN | Score_MORT_30_STK | Score_PSI_03 | Score_PSI_04 | Score_PSI_06 | Score_PSI_08 | Score_PSI_09 | Score_PSI_10 | Score_PSI_11 | Score_PSI_12 | Score_PSI_13 | Score_PSI_14 | Score_PSI_15 | Payment_PAYM_90_HIP_KNEE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PredictedReadmissionRate_HIP_KNEE | PredictedReadmissionRate_HIP_KNEE | 1.0000000 | -0.2060912 | 0.1986939 | -0.0563082 | -0.0028840 | 0.1295727 | -0.0106510 | 0.1063002 | 0.0654668 | 0.3208550 | 0.0074065 | -0.0794948 | -0.1067828 | -0.0985660 | -0.0376746 | -0.0037334 | -0.0449077 | 0.0154891 | -0.0214412 | -0.0182303 | 0.0710046 | 0.1130121 | 0.1047402 | 0.1193336 | 0.0140012 | -0.0158282 | 0.2975679 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | -0.2060912 | 1.0000000 | -0.2262341 | 0.0302154 | 0.2182505 | -0.2448842 | 0.0865028 | 0.1100002 | -0.0248375 | -0.1091775 | -0.0952111 | -0.0230632 | 0.0300940 | -0.0702915 | -0.0760295 | -0.0499063 | -0.0948401 | -0.0054742 | -0.0590998 | 0.0707774 | -0.0157615 | -0.1615833 | -0.0674588 | -0.1458284 | -0.0172231 | 0.0304634 | -0.2108956 |
| EDV | EDV | 0.1986939 | -0.2262341 | 1.0000000 | 0.1599806 | 0.0038674 | 0.5918897 | 0.0603877 | -0.1223889 | 0.2992859 | -0.0240093 | -0.0687401 | -0.0840621 | -0.2588739 | -0.1904351 | -0.0754281 | 0.0292819 | -0.0438108 | -0.0308989 | -0.1637017 | -0.0181523 | 0.0837980 | 0.0455190 | 0.0749218 | 0.0724936 | 0.0399935 | 0.0144400 | 0.0053946 |
| HCP_COVID_19 | HCP_COVID_19 | -0.0563082 | 0.0302154 | 0.1599806 | 1.0000000 | 0.3203622 | 0.2574291 | 0.1067941 | -0.0812735 | 0.0241622 | -0.0510683 | -0.0869890 | -0.1128278 | -0.1245435 | -0.1523779 | -0.0988833 | 0.0943953 | 0.0417007 | 0.0225916 | -0.0272232 | 0.0549990 | 0.0009222 | -0.0909811 | 0.1091949 | -0.0160408 | 0.0169512 | 0.0430812 | -0.0627505 |
| IMM_3 | IMM_3 | -0.0028840 | 0.2182505 | 0.0038674 | 0.3203622 | 1.0000000 | 0.1105628 | 0.1317922 | 0.0410289 | 0.0906329 | -0.0212916 | -0.0165321 | -0.0616051 | -0.0010634 | -0.0761104 | 0.0146397 | 0.0451508 | 0.0601831 | 0.0544576 | -0.0311579 | 0.0899361 | 0.0412713 | -0.0625676 | 0.0594933 | -0.0250714 | 0.0723872 | 0.0625753 | -0.0720431 |
| OP_18b | OP_18b | 0.1295727 | -0.2448842 | 0.5918897 | 0.2574291 | 0.1105628 | 1.0000000 | 0.0506067 | -0.1400845 | 0.2344307 | -0.0293698 | -0.0678837 | -0.1527290 | -0.2195933 | -0.1858291 | -0.0905644 | 0.0583644 | 0.0638412 | 0.0187794 | -0.0806516 | 0.0224833 | 0.0544528 | -0.0076797 | 0.1653812 | 0.0993570 | 0.0676972 | 0.0374530 | -0.0241965 |
| OP_29 | OP_29 | -0.0106510 | 0.0865028 | 0.0603877 | 0.1067941 | 0.1317922 | 0.0506067 | 1.0000000 | -0.0650231 | 0.1567526 | -0.0096464 | -0.0600569 | 0.0081252 | 0.0099705 | -0.0536184 | -0.0251654 | -0.0032584 | 0.0312780 | 0.0059688 | -0.0199006 | 0.0150699 | 0.0331569 | -0.0837910 | -0.0108546 | -0.0087678 | 0.0163779 | 0.0419068 | -0.0815209 |
| SAFE_USE_OF_OPIOIDS | SAFE_USE_OF_OPIOIDS | 0.1063002 | 0.1100002 | -0.1223889 | -0.0812735 | 0.0410289 | -0.1400845 | -0.0650231 | 1.0000000 | -0.0563373 | -0.0081923 | -0.0643353 | -0.0362573 | 0.0171605 | -0.0204107 | -0.0804707 | -0.0287158 | -0.0995591 | -0.0603629 | -0.0017558 | -0.0043396 | -0.0382369 | 0.0137283 | -0.0380145 | -0.0258097 | -0.0166146 | -0.0268805 | -0.0048449 |
| VTE_1 | VTE_1 | 0.0654668 | -0.0248375 | 0.2992859 | 0.0241622 | 0.0906329 | 0.2344307 | 0.1567526 | -0.0563373 | 1.0000000 | -0.0526925 | -0.0493931 | -0.0282911 | -0.1051171 | -0.1710235 | -0.1238916 | -0.0378500 | -0.1180160 | -0.0262166 | -0.0522876 | -0.0577213 | -0.0037803 | -0.0301775 | -0.0317534 | -0.0440578 | 0.0086141 | 0.0326658 | -0.1256363 |
| Score_COMP_HIP_KNEE | Score_COMP_HIP_KNEE | 0.3208550 | -0.1091775 | -0.0240093 | -0.0510683 | -0.0212916 | -0.0293698 | -0.0096464 | -0.0081923 | -0.0526925 | 1.0000000 | 0.0830479 | -0.0203930 | -0.0007242 | 0.0241066 | 0.0211621 | 0.0498557 | 0.0038509 | 0.0505415 | 0.0577776 | 0.0540124 | 0.0813038 | 0.1279724 | 0.1458258 | 0.1334619 | 0.0498603 | 0.0433809 | 0.3410864 |
| Score_MORT_30_AMI | Score_MORT_30_AMI | 0.0074065 | -0.0952111 | -0.0687401 | -0.0869890 | -0.0165321 | -0.0678837 | -0.0600569 | -0.0643353 | -0.0493931 | 0.0830479 | 1.0000000 | 0.2498600 | 0.3407616 | 0.3309425 | 0.2222539 | 0.0415523 | 0.2105379 | 0.0885083 | 0.1010348 | 0.0889343 | 0.1066619 | 0.1037006 | 0.0492328 | 0.0467554 | 0.0454462 | 0.0297688 | 0.0591548 |
| Score_MORT_30_COPD | Score_MORT_30_COPD | -0.0794948 | -0.0230632 | -0.0840621 | -0.1128278 | -0.0616051 | -0.1527290 | 0.0081252 | -0.0362573 | -0.0282911 | -0.0203930 | 0.2498600 | 1.0000000 | 0.3844105 | 0.3710744 | 0.2038243 | -0.0069743 | 0.1713379 | 0.0478268 | 0.0397571 | 0.0429090 | 0.0320669 | 0.0426574 | -0.0532586 | 0.0026944 | 0.0734846 | 0.0340007 | -0.0406696 |
| Score_MORT_30_HF | Score_MORT_30_HF | -0.1067828 | 0.0300940 | -0.2588739 | -0.1245435 | -0.0010634 | -0.2195933 | 0.0099705 | 0.0171605 | -0.1051171 | -0.0007242 | 0.3407616 | 0.3844105 | 1.0000000 | 0.4479367 | 0.3147371 | 0.0371596 | 0.2556384 | 0.0679149 | 0.1051698 | 0.0707269 | 0.0383771 | 0.0362529 | -0.0300702 | -0.0086832 | 0.0647245 | 0.0342374 | -0.0350247 |
| Score_MORT_30_PN | Score_MORT_30_PN | -0.0985660 | -0.0702915 | -0.1904351 | -0.1523779 | -0.0761104 | -0.1858291 | -0.0536184 | -0.0204107 | -0.1710235 | 0.0241066 | 0.3309425 | 0.3710744 | 0.4479367 | 1.0000000 | 0.3042563 | 0.0303815 | 0.2301195 | 0.0543554 | 0.0884315 | 0.0217880 | 0.0237048 | 0.0704445 | 0.0089560 | 0.0393676 | 0.0464407 | 0.0029691 | -0.0062985 |
| Score_MORT_30_STK | Score_MORT_30_STK | -0.0376746 | -0.0760295 | -0.0754281 | -0.0988833 | 0.0146397 | -0.0905644 | -0.0251654 | -0.0804707 | -0.1238916 | 0.0211621 | 0.2222539 | 0.2038243 | 0.3147371 | 0.3042563 | 1.0000000 | 0.0687216 | 0.2380935 | 0.0878847 | 0.1014879 | 0.0674377 | 0.0622532 | 0.0725381 | 0.0474896 | 0.0513975 | 0.0492194 | 0.0625191 | -0.0272101 |
| Score_PSI_03 | Score_PSI_03 | -0.0037334 | -0.0499063 | 0.0292819 | 0.0943953 | 0.0451508 | 0.0583644 | -0.0032584 | -0.0287158 | -0.0378500 | 0.0498557 | 0.0415523 | -0.0069743 | 0.0371596 | 0.0303815 | 0.0687216 | 1.0000000 | 0.1353085 | 0.0601750 | 0.0636661 | 0.1407342 | 0.0386211 | 0.0114365 | 0.1186788 | 0.0298580 | 0.0596798 | 0.0999683 | 0.0086745 |
| Score_PSI_04 | Score_PSI_04 | -0.0449077 | -0.0948401 | -0.0438108 | 0.0417007 | 0.0601831 | 0.0638412 | 0.0312780 | -0.0995591 | -0.1180160 | 0.0038509 | 0.2105379 | 0.1713379 | 0.2556384 | 0.2301195 | 0.2380935 | 0.1353085 | 1.0000000 | 0.0601419 | 0.0870693 | 0.1059485 | 0.0523892 | 0.0649032 | 0.0782559 | 0.0123489 | 0.0652098 | 0.1018205 | -0.0766302 |
| Score_PSI_06 | Score_PSI_06 | 0.0154891 | -0.0054742 | -0.0308989 | 0.0225916 | 0.0544576 | 0.0187794 | 0.0059688 | -0.0603629 | -0.0262166 | 0.0505415 | 0.0885083 | 0.0478268 | 0.0679149 | 0.0543554 | 0.0878847 | 0.0601750 | 0.0601419 | 1.0000000 | 0.0724291 | 0.1014588 | 0.0516246 | 0.0351464 | 0.1431056 | 0.0509831 | 0.0527115 | 0.0910520 | 0.0456525 |
| Score_PSI_08 | Score_PSI_08 | -0.0214412 | -0.0590998 | -0.1637017 | -0.0272232 | -0.0311579 | -0.0806516 | -0.0199006 | -0.0017558 | -0.0522876 | 0.0577776 | 0.1010348 | 0.0397571 | 0.1051698 | 0.0884315 | 0.1014879 | 0.0636661 | 0.0870693 | 0.0724291 | 1.0000000 | 0.0052449 | -0.0360093 | 0.0198090 | 0.0394605 | 0.0093444 | 0.0228045 | 0.0127268 | -0.0041983 |
| Score_PSI_09 | Score_PSI_09 | -0.0182303 | 0.0707774 | -0.0181523 | 0.0549990 | 0.0899361 | 0.0224833 | 0.0150699 | -0.0043396 | -0.0577213 | 0.0540124 | 0.0889343 | 0.0429090 | 0.0707269 | 0.0217880 | 0.0674377 | 0.1407342 | 0.1059485 | 0.1014588 | 0.0052449 | 1.0000000 | 0.0885278 | 0.0680540 | 0.1732337 | 0.0519119 | 0.1207438 | 0.2197254 | -0.0237660 |
| Score_PSI_10 | Score_PSI_10 | 0.0710046 | -0.0157615 | 0.0837980 | 0.0009222 | 0.0412713 | 0.0544528 | 0.0331569 | -0.0382369 | -0.0037803 | 0.0813038 | 0.1066619 | 0.0320669 | 0.0383771 | 0.0237048 | 0.0622532 | 0.0386211 | 0.0523892 | 0.0516246 | -0.0360093 | 0.0885278 | 1.0000000 | 0.1626632 | 0.1079488 | 0.2303938 | 0.0453739 | 0.0830134 | 0.0497447 |
| Score_PSI_11 | Score_PSI_11 | 0.1130121 | -0.1615833 | 0.0455190 | -0.0909811 | -0.0625676 | -0.0076797 | -0.0837910 | 0.0137283 | -0.0301775 | 0.1279724 | 0.1037006 | 0.0426574 | 0.0362529 | 0.0704445 | 0.0725381 | 0.0114365 | 0.0649032 | 0.0351464 | 0.0198090 | 0.0680540 | 0.1626632 | 1.0000000 | 0.1172504 | 0.2506376 | -0.0093577 | 0.0464067 | 0.1441986 |
| Score_PSI_12 | Score_PSI_12 | 0.1047402 | -0.0674588 | 0.0749218 | 0.1091949 | 0.0594933 | 0.1653812 | -0.0108546 | -0.0380145 | -0.0317534 | 0.1458258 | 0.0492328 | -0.0532586 | -0.0300702 | 0.0089560 | 0.0474896 | 0.1186788 | 0.0782559 | 0.1431056 | 0.0394605 | 0.1732337 | 0.1079488 | 0.1172504 | 1.0000000 | 0.1742084 | 0.0522204 | 0.1358951 | 0.0655557 |
| Score_PSI_13 | Score_PSI_13 | 0.1193336 | -0.1458284 | 0.0724936 | -0.0160408 | -0.0250714 | 0.0993570 | -0.0087678 | -0.0258097 | -0.0440578 | 0.1334619 | 0.0467554 | 0.0026944 | -0.0086832 | 0.0393676 | 0.0513975 | 0.0298580 | 0.0123489 | 0.0509831 | 0.0093444 | 0.0519119 | 0.2303938 | 0.2506376 | 0.1742084 | 1.0000000 | 0.0056987 | 0.0878105 | 0.0949467 |
| Score_PSI_14 | Score_PSI_14 | 0.0140012 | -0.0172231 | 0.0399935 | 0.0169512 | 0.0723872 | 0.0676972 | 0.0163779 | -0.0166146 | 0.0086141 | 0.0498603 | 0.0454462 | 0.0734846 | 0.0647245 | 0.0464407 | 0.0492194 | 0.0596798 | 0.0652098 | 0.0527115 | 0.0228045 | 0.1207438 | 0.0453739 | -0.0093577 | 0.0522204 | 0.0056987 | 1.0000000 | 0.1176726 | -0.0181150 |
| Score_PSI_15 | Score_PSI_15 | -0.0158282 | 0.0304634 | 0.0144400 | 0.0430812 | 0.0625753 | 0.0374530 | 0.0419068 | -0.0268805 | 0.0326658 | 0.0433809 | 0.0297688 | 0.0340007 | 0.0342374 | 0.0029691 | 0.0625191 | 0.0999683 | 0.1018205 | 0.0910520 | 0.0127268 | 0.2197254 | 0.0830134 | 0.0464067 | 0.1358951 | 0.0878105 | 0.1176726 | 1.0000000 | -0.0467071 |
| Payment_PAYM_90_HIP_KNEE | Payment_PAYM_90_HIP_KNEE | 0.2975679 | -0.2108956 | 0.0053946 | -0.0627505 | -0.0720431 | -0.0241965 | -0.0815209 | -0.0048449 | -0.1256363 | 0.3410864 | 0.0591548 | -0.0406696 | -0.0350247 | -0.0062985 | -0.0272101 | 0.0086745 | -0.0766302 | 0.0456525 | -0.0041983 | -0.0237660 | 0.0497447 | 0.1441986 | 0.0655557 | 0.0949467 | -0.0181150 | -0.0467071 | 1.0000000 |
# Remove all NA values in target variable "PredictedReadmissionRate_HIP_KNEE"
HipKneeTrain <- HipKneeTrain %>% filter(!is.na(PredictedReadmissionRate_HIP_KNEE))
# Remove all NA values in the "State", "StateCode", and "FacilityName" columns
HipKneeTrain <- HipKneeTrain %>% drop_na(State, StateCode, FacilityName)
# Print number of remaining variables and observations
dimensions <- dim(HipKneeTrain)
cat("Number of variables:", dimensions[2], "\n")
## Number of variables: 31
cat("Number of observations:", dimensions[1], "\n")
## Number of observations: 1833
We decided to remove the one facility that had an NA value, which also happened to be the same observation with a missing state value.
# Calculate missing values
missing_values_summary <- HipKneeTrain %>%
summarise(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTrain)) * 100)
# Print table
missing_values_summary %>%
kable(caption = "Table 7. Missing Values Summary") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | Missing_Count | Missing_Percentage |
|---|---|---|
| FacilityId | 0 | 0.0000000 |
| PredictedReadmissionRate_HIP_KNEE | 0 | 0.0000000 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | 33 | 1.8003273 |
| EDV | 90 | 4.9099836 |
| HCP_COVID_19 | 16 | 0.8728860 |
| IMM_3 | 16 | 0.8728860 |
| OP_18b | 75 | 4.0916530 |
| OP_29 | 222 | 12.1112930 |
| SAFE_USE_OF_OPIOIDS | 69 | 3.7643208 |
| VTE_1 | 994 | 54.2280415 |
| Score_COMP_HIP_KNEE | 40 | 2.1822149 |
| Score_MORT_30_AMI | 405 | 22.0949264 |
| Score_MORT_30_COPD | 247 | 13.4751773 |
| Score_MORT_30_HF | 141 | 7.6923077 |
| Score_MORT_30_PN | 125 | 6.8194217 |
| Score_MORT_30_STK | 284 | 15.4937261 |
| Score_PSI_03 | 8 | 0.4364430 |
| Score_PSI_04 | 575 | 31.3693399 |
| Score_PSI_06 | 2 | 0.1091107 |
| Score_PSI_08 | 2 | 0.1091107 |
| Score_PSI_09 | 2 | 0.1091107 |
| Score_PSI_10 | 41 | 2.2367703 |
| Score_PSI_11 | 40 | 2.1822149 |
| Score_PSI_12 | 2 | 0.1091107 |
| Score_PSI_13 | 42 | 2.2913257 |
| Score_PSI_14 | 87 | 4.7463175 |
| Score_PSI_15 | 29 | 1.5821058 |
| FacilityName | 0 | 0.0000000 |
| State | 0 | 0.0000000 |
| Payment_PAYM_90_HIP_KNEE | 42 | 2.2913257 |
| StateCode | 0 | 0.0000000 |
# Calculate median for columns with <5% missing values
numeric_vars_low_missing <- c("HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "EDV", "HCP_COVID_19", "IMM_3", "OP_18b", "SAFE_USE_OF_OPIOIDS", "Score_COMP_HIP_KNEE", "Score_PSI_03", "Score_PSI_06", "Score_PSI_08", "Score_PSI_09", "Score_PSI_10", "Score_PSI_11", "Score_PSI_12", "Score_PSI_13", "Score_PSI_14", "Score_PSI_15", "Payment_PAYM_90_HIP_KNEE")
for (var in numeric_vars_low_missing) {
HipKneeTrain[[var]][is.na(HipKneeTrain[[var]])] <- median(HipKneeTrain[[var]], na.rm = TRUE)
}
# Select high missingness variables for KNN imputation
vars_for_knn <- c("VTE_1", "Score_MORT_30_AMI", "Score_MORT_30_COPD", "Score_MORT_30_HF", "Score_MORT_30_PN", "Score_MORT_30_STK", "Score_PSI_04", "OP_29")
# Perform KNN imputation
HipKneeTrain_knn <- kNN(HipKneeTrain, variable = vars_for_knn, k = 5)
# Remove columns created by the KNN function
HipKneeTrain_knn <- HipKneeTrain_knn %>% select(-ends_with("_imp"))
# Update HipKneeTrain with imputed values
HipKneeTrain[vars_for_knn] <- HipKneeTrain_knn[vars_for_knn]
Is this a good method for imputing missing values? We decided that many of our variables had very low missingness percentage, <1%. So, Median imputation would be fine in this case. For the few variables that had higher missingness we went with KNN imputation. Do you have any suggestions or ideas that would be more appropriate here?
# Calculate missing values
missing_values_summary <- HipKneeTrain %>%
summarise(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTrain)) * 100)
# Print table
missing_values_summary %>%
kable(caption = "Table 7. Missing Values Summary") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | Missing_Count | Missing_Percentage |
|---|---|---|
| FacilityId | 0 | 0 |
| PredictedReadmissionRate_HIP_KNEE | 0 | 0 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | 0 | 0 |
| EDV | 0 | 0 |
| HCP_COVID_19 | 0 | 0 |
| IMM_3 | 0 | 0 |
| OP_18b | 0 | 0 |
| OP_29 | 0 | 0 |
| SAFE_USE_OF_OPIOIDS | 0 | 0 |
| VTE_1 | 0 | 0 |
| Score_COMP_HIP_KNEE | 0 | 0 |
| Score_MORT_30_AMI | 0 | 0 |
| Score_MORT_30_COPD | 0 | 0 |
| Score_MORT_30_HF | 0 | 0 |
| Score_MORT_30_PN | 0 | 0 |
| Score_MORT_30_STK | 0 | 0 |
| Score_PSI_03 | 0 | 0 |
| Score_PSI_04 | 0 | 0 |
| Score_PSI_06 | 0 | 0 |
| Score_PSI_08 | 0 | 0 |
| Score_PSI_09 | 0 | 0 |
| Score_PSI_10 | 0 | 0 |
| Score_PSI_11 | 0 | 0 |
| Score_PSI_12 | 0 | 0 |
| Score_PSI_13 | 0 | 0 |
| Score_PSI_14 | 0 | 0 |
| Score_PSI_15 | 0 | 0 |
| FacilityName | 0 | 0 |
| State | 0 | 0 |
| Payment_PAYM_90_HIP_KNEE | 0 | 0 |
| StateCode | 0 | 0 |
# Average death rates amongst mortality variables and create new column "Score_Ovr_MORT"
HipKneeTrain$Score_Ovr_MORT <- rowMeans(HipKneeTrain[, c("Score_MORT_30_AMI",
"Score_MORT_30_COPD",
"Score_MORT_30_HF",
"Score_MORT_30_PN",
"Score_MORT_30_STK")],
na.rm = TRUE)
# Remove old mortality columns
HipKneeTrain <- HipKneeTrain[, !(names(HipKneeTrain) %in% c("Score_MORT_30_AMI",
"Score_MORT_30_COPD",
"Score_MORT_30_HF",
"Score_MORT_30_PN",
"Score_MORT_30_STK"))]
# Compute correlation matrix
cor_matrix <- cor(HipKneeTrain %>% select_if(is.numeric), use = "pairwise.complete.obs")
# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)
# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), name = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")
We are utilizing the most recent snapshot from 04/24/2024 as our test set. Utilizing this brand new data will help to ensure that our model is generalizable and useful for future analyses.
# Set the directory for the data files
filepath <- "/Users/adelinecasali/Desktop/hospitals_04_2024/"
# List the files in the directory that have "Hospital.csv"
files <- list.files(path = filepath, pattern = "Hospital.csv")
# Iterate through each file in the list
for(f in 1:length(files)) {
# Read the CSV, clean column names to upper camel case, and store in "dat"
dat <- clean_names(read_csv(paste0(filepath, files[f]),
show_col_types = FALSE),
case = "upper_camel")
# Remove ".Hospital.csv" part of the file names to create variable name
filename <- gsub(".Hospital\\.csv", "", files[f])
# Assign data to a variable with the above created name
assign(filename, dat)
}
# Create a df of file names without ".Hospital.csv"
files <- gsub(".Hospital\\.csv", "", files) %>% data.frame()
# Set column name of the df to "File Name"
names(files) <- "File Name"
files %>%
kable(
format = "html",
caption = "Table 1. List of hospital-level data files.") %>%
kable_styling(bootstrap_options = c("striped", full_width = F)
)
| File Name |
|---|
| Complications_and_Deaths |
| FY_2024_HAC_Reduction_Program |
| FY_2024_Hospital_Readmissions_Reduction_Program |
| HCAHPS |
| Healthcare_Associated_Infections |
| Maternal_Health |
| Medicare_Hospital_Spending_Per_Patient |
| Outpatient_Imaging_Efficiency |
| Payment_and_Value_of_Care |
| Timely_and_Effective_Care |
| Unplanned_Hospital_Visits |
# Display first 10 rows of FY_2024_Hospital_Readmissions_Reduction_Program
head(FY_2024_Hospital_Readmissions_Reduction_Program,10)
## # A tibble: 10 × 12
## FacilityName FacilityId State MeasureName NumberOfDischarges Footnote
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 SOUTHEAST HEALTH ME… 010001 AL READM-30-H… N/A NA
## 2 SOUTHEAST HEALTH ME… 010001 AL READM-30-H… 616 NA
## 3 SOUTHEAST HEALTH ME… 010001 AL READM-30-A… 274 NA
## 4 SOUTHEAST HEALTH ME… 010001 AL READM-30-P… 404 NA
## 5 SOUTHEAST HEALTH ME… 010001 AL READM-30-C… 126 NA
## 6 SOUTHEAST HEALTH ME… 010001 AL READM-30-C… 117 NA
## 7 MARSHALL MEDICAL CE… 010005 AL READM-30-A… N/A 1
## 8 MARSHALL MEDICAL CE… 010005 AL READM-30-C… 137 NA
## 9 MARSHALL MEDICAL CE… 010005 AL READM-30-P… 285 NA
## 10 MARSHALL MEDICAL CE… 010005 AL READM-30-H… 129 NA
## # ℹ 6 more variables: ExcessReadmissionRatio <chr>,
## # PredictedReadmissionRate <chr>, ExpectedReadmissionRate <chr>,
## # NumberOfReadmissions <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## Footnote
## 12077
# Use the function "replace_with_na_all()" to replace aberrant values with NA
FY_2024_Hospital_Readmissions_Reduction_Program <- replace_with_na_all(FY_2024_Hospital_Readmissions_Reduction_Program, condition = ~ .x == "N/A")
# Replace "Too Few to Report" values with "5" in using gsub
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- gsub("Too Few to Report", "5", FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)
# Check first 10 rows to confirm that it worked
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions, 10)
## [1] "5" "149" "32" "68" "11" "20" NA "14" "40" "24"
# NumberOfReadmissions had to be converted to numeric before applying integers
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- as.numeric(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)
# Find all values of "5" in NumberOfReadmissions
fives <- which(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions == 5)
# Replace values of "5" with random integers from 1 - 10
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions[fives] <- sample(1:10, length(fives), replace = TRUE)
# Check the first 20 rows to see if this was applied correctly
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions,20)
## [1] 3 149 32 68 11 20 NA 14 40 24 1 NA 7 21 15 83 36 75 5
## [20] NA
# Selecting the columns to convert
columns_to_convert <- c("NumberOfDischarges", "ExcessReadmissionRatio", "PredictedReadmissionRate", "ExpectedReadmissionRate", "NumberOfReadmissions")
# Use mutate_at to convert the specified columns to numeric
FY_2024_Hospital_Readmissions_Reduction_Program <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
mutate_at(vars(one_of(columns_to_convert)), as.numeric)
# Print the structure of the dataframe to check the changes
str(FY_2024_Hospital_Readmissions_Reduction_Program)
## tibble [18,774 × 12] (S3: tbl_df/tbl/data.frame)
## $ FacilityName : chr [1:18774] "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" ...
## $ FacilityId : chr [1:18774] "010001" "010001" "010001" "010001" ...
## $ State : chr [1:18774] "AL" "AL" "AL" "AL" ...
## $ MeasureName : chr [1:18774] "READM-30-HIP-KNEE-HRRP" "READM-30-HF-HRRP" "READM-30-AMI-HRRP" "READM-30-PN-HRRP" ...
## $ NumberOfDischarges : num [1:18774] NA 616 274 404 126 117 NA 137 285 129 ...
## $ Footnote : num [1:18774] NA NA NA NA NA NA 1 NA NA NA ...
## $ ExcessReadmissionRatio : num [1:18774] 0.892 1.1 0.933 0.987 0.952 ...
## $ PredictedReadmissionRate: num [1:18774] 3.53 23.13 12.9 17.05 9.81 ...
## $ ExpectedReadmissionRate : num [1:18774] 3.96 21.02 13.83 17.28 10.31 ...
## $ NumberOfReadmissions : num [1:18774] 3 149 32 68 11 20 NA 14 40 24 ...
## $ StartDate : chr [1:18774] "07/01/2019" "07/01/2019" "07/01/2019" "07/01/2019" ...
## $ EndDate : chr [1:18774] "06/30/2022" "06/30/2022" "06/30/2022" "06/30/2022" ...
FY_2024_Hospital_Readmissions_Reduction_Program <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
mutate(MeasureName = gsub("READM-30-", "", MeasureName)) %>%
mutate(MeasureName = gsub("-HRRP", "", MeasureName))
readmissionsClean <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
pivot_wider(
names_from = MeasureName,
values_from = c(NumberOfDischarges, ExcessReadmissionRatio, PredictedReadmissionRate, ExpectedReadmissionRate, NumberOfReadmissions),
id_cols = c(FacilityName, FacilityId, State, StartDate, EndDate)
)
# Check the new dataframe
dim(readmissionsClean)
## [1] 3129 35
head(readmissionsClean)
## # A tibble: 6 × 35
## FacilityName FacilityId State StartDate EndDate NumberOfDischarges_H…¹
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 SOUTHEAST HEALTH ME… 010001 AL 07/01/20… 06/30/… NA
## 2 MARSHALL MEDICAL CE… 010005 AL 07/01/20… 06/30/… NA
## 3 NORTH ALABAMA MEDIC… 010006 AL 07/01/20… 06/30/… NA
## 4 MIZELL MEMORIAL HOS… 010007 AL 07/01/20… 06/30/… NA
## 5 CRENSHAW COMMUNITY … 010008 AL 07/01/20… 06/30/… NA
## 6 ST. VINCENT'S EAST 010011 AL 07/01/20… 06/30/… NA
## # ℹ abbreviated name: ¹`NumberOfDischarges_HIP-KNEE`
## # ℹ 29 more variables: NumberOfDischarges_HF <dbl>,
## # NumberOfDischarges_AMI <dbl>, NumberOfDischarges_PN <dbl>,
## # NumberOfDischarges_CABG <dbl>, NumberOfDischarges_COPD <dbl>,
## # `ExcessReadmissionRatio_HIP-KNEE` <dbl>, ExcessReadmissionRatio_HF <dbl>,
## # ExcessReadmissionRatio_AMI <dbl>, ExcessReadmissionRatio_PN <dbl>,
## # ExcessReadmissionRatio_CABG <dbl>, ExcessReadmissionRatio_COPD <dbl>, …
readmissionsClean <- readmissionsClean %>%
select(FacilityName, FacilityId, State, matches("HIP-KNEE$"))
# Display first 10 rows of HCAHPS
head(HCAHPS,10)
## # A tibble: 10 × 22
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 6 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 7 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 8 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 9 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 10 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## # ℹ 15 more variables: TelephoneNumber <chr>, HcahpsMeasureId <chr>,
## # HcahpsQuestion <chr>, HcahpsAnswerDescription <chr>,
## # PatientSurveyStarRating <chr>, PatientSurveyStarRatingFootnote <dbl>,
## # HcahpsAnswerPercent <chr>, HcahpsAnswerPercentFootnote <chr>,
## # HcahpsLinearMeanValue <chr>, NumberOfCompletedSurveys <chr>,
## # NumberOfCompletedSurveysFootnote <chr>, SurveyResponseRatePercent <chr>,
## # SurveyResponseRatePercentFootnote <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- HCAHPS %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## PatientSurveyStarRatingFootnote
## 430641
# Removing all footnote columns
HCAHPS <- HCAHPS %>%
select(-ends_with("footnote"))
# Replacing all "Not Applicable" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
HCAHPSClean <- HCAHPS %>%
pivot_wider(
names_from = HcahpsMeasureId,
values_from = c(PatientSurveyStarRating, HcahpsAnswerPercent, HcahpsLinearMeanValue, SurveyResponseRatePercent),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(HCAHPSClean)
## [1] 4814 375
head(HCAHPSClean)
## # A tibble: 6 × 375
## FacilityName FacilityId State PatientSurveyStarRat…¹ PatientSurveyStarRat…²
## <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEAL… 010001 AL <NA> <NA>
## 2 MARSHALL MEDIC… 010005 AL <NA> <NA>
## 3 NORTH ALABAMA … 010006 AL <NA> <NA>
## 4 MIZELL MEMORIA… 010007 AL <NA> <NA>
## 5 CRENSHAW COMMU… 010008 AL <NA> <NA>
## 6 ST. VINCENT'S … 010011 AL <NA> <NA>
## # ℹ abbreviated names: ¹PatientSurveyStarRating_H_COMP_1_A_P,
## # ²PatientSurveyStarRating_H_COMP_1_SN_P
## # ℹ 370 more variables: PatientSurveyStarRating_H_COMP_1_U_P <chr>,
## # PatientSurveyStarRating_H_COMP_1_LINEAR_SCORE <chr>,
## # PatientSurveyStarRating_H_COMP_1_STAR_RATING <chr>,
## # PatientSurveyStarRating_H_NURSE_RESPECT_A_P <chr>,
## # PatientSurveyStarRating_H_NURSE_RESPECT_SN_P <chr>, …
# Display first 10 rows of Timely_and_Effective_Care
head(Timely_and_Effective_Care,10)
## # A tibble: 10 × 16
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 6 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 7 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 8 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 9 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 10 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## # ℹ 9 more variables: TelephoneNumber <chr>, Condition <chr>, MeasureId <chr>,
## # MeasureName <chr>, Score <chr>, Sample <chr>, Footnote <chr>,
## # StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Timely_and_Effective_Care %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()
# Replacing all "Not Applicable" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
careClean <- Timely_and_Effective_Care %>%
pivot_wider(
names_from = MeasureId,
values_from = c(Score),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(careClean)
## [1] 4677 26
head(careClean)
## # A tibble: 6 × 26
## FacilityName FacilityId State EDV ED_2_Strata_1 ED_2_Strata_2 HCP_COVID_19
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEA… 010001 AL high <NA> <NA> 80.7
## 2 MARSHALL MEDI… 010005 AL high 148 105 79.8
## 3 NORTH ALABAMA… 010006 AL high <NA> <NA> 79
## 4 MIZELL MEMORI… 010007 AL low <NA> <NA> 57.9
## 5 CRENSHAW COMM… 010008 AL low <NA> <NA> 81.2
## 6 ST. VINCENT'S… 010011 AL high <NA> <NA> 88
## # ℹ 19 more variables: IMM_3 <chr>, OP_18b <chr>, OP_18c <chr>, OP_22 <chr>,
## # OP_23 <chr>, OP_29 <chr>, OP_31 <chr>, SAFE_USE_OF_OPIOIDS <chr>,
## # SEP_1 <chr>, SEP_SH_3HR <chr>, SEP_SH_6HR <chr>, SEV_SEP_3HR <chr>,
## # SEV_SEP_6HR <chr>, STK_02 <chr>, STK_03 <chr>, STK_05 <chr>, STK_06 <chr>,
## # VTE_1 <chr>, VTE_2 <chr>
# Display first 10 rows of Complications_and_Deaths
head(Complications_and_Deaths,10)
## # A tibble: 10 × 18
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 6 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 7 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 8 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 9 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 10 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## # ℹ 11 more variables: TelephoneNumber <chr>, MeasureId <chr>,
## # MeasureName <chr>, ComparedToNational <chr>, Denominator <chr>,
## # Score <chr>, LowerEstimate <chr>, HigherEstimate <chr>, Footnote <chr>,
## # StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Complications_and_Deaths %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()
# Replacing all "Not Applicable" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
deathsClean <- Complications_and_Deaths %>%
pivot_wider(
names_from = MeasureId,
values_from = c(ComparedToNational, Score),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(deathsClean)
## [1] 4814 41
head(deathsClean)
## # A tibble: 6 × 41
## FacilityName FacilityId State ComparedToNational_C…¹ ComparedToNational_M…²
## <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEAL… 010001 AL No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005 AL No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006 AL No Different Than the… Worse Than the Nation…
## 4 MIZELL MEMORIA… 010007 AL Number of Cases Too S… Number of Cases Too S…
## 5 CRENSHAW COMMU… 010008 AL <NA> Number of Cases Too S…
## 6 ST. VINCENT'S … 010011 AL No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹ComparedToNational_COMP_HIP_KNEE,
## # ²ComparedToNational_MORT_30_AMI
## # ℹ 36 more variables: ComparedToNational_MORT_30_CABG <chr>,
## # ComparedToNational_MORT_30_COPD <chr>, ComparedToNational_MORT_30_HF <chr>,
## # ComparedToNational_MORT_30_PN <chr>, ComparedToNational_MORT_30_STK <chr>,
## # ComparedToNational_PSI_03 <chr>, ComparedToNational_PSI_04 <chr>,
## # ComparedToNational_PSI_06 <chr>, ComparedToNational_PSI_08 <chr>, …
# Display first 10 rows of Payment_and_Value_of_Care
head(Payment_and_Value_of_Care,10)
## # A tibble: 10 × 22
## FacilityId FacilityName Address CityTown State ZipCode CountyParish
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 2 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 3 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 4 010001 SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN AL 36301 HOUSTON
## 5 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 6 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 7 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 8 010005 MARSHALL MEDICAL CENT… 2505 U… BOAZ AL 35957 MARSHALL
## 9 010006 NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL 35630 LAUDERDALE
## 10 010006 NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL 35630 LAUDERDALE
## # ℹ 15 more variables: TelephoneNumber <chr>, PaymentMeasureId <chr>,
## # PaymentMeasureName <chr>, PaymentCategory <chr>, Denominator <chr>,
## # Payment <chr>, LowerEstimate <chr>, HigherEstimate <chr>,
## # PaymentFootnote <dbl>, ValueOfCareDisplayId <chr>,
## # ValueOfCareDisplayName <chr>, ValueOfCareCategory <chr>,
## # ValueOfCareFootnote <dbl>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Payment_and_Value_of_Care %>%
select_if(is.numeric)
# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## PaymentFootnote ValueOfCareFootnote
## 9956 10044
# Replacing all "Not Applicable" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
if (is.character(x)) {
x[x == "Not Applicable"] <- NA
}
return(x)
}))
# Replacing all "Not Available" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
if (is.character(x)) {
x[x == "Not Available"] <- NA
}
return(x)
}))
paymentClean <- Payment_and_Value_of_Care %>%
pivot_wider(
names_from = PaymentMeasureId,
values_from = c(PaymentCategory, Payment),
id_cols = c(FacilityName, FacilityId, State)
)
# Check the new dataframe
dim(paymentClean)
## [1] 4645 11
head(paymentClean)
## # A tibble: 6 × 11
## FacilityName FacilityId State PaymentCategory_PAYM…¹ PaymentCategory_PAYM…²
## <chr> <chr> <chr> <chr> <chr>
## 1 SOUTHEAST HEAL… 010001 AL No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005 AL No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006 AL Greater Than the Nati… No Different Than the…
## 4 MIZELL MEMORIA… 010007 AL Number of Cases Too S… No Different Than the…
## 5 CRENSHAW COMMU… 010008 AL Number of Cases Too S… Number of Cases Too S…
## 6 ST. VINCENT'S … 010011 AL No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹PaymentCategory_PAYM_30_AMI,
## # ²PaymentCategory_PAYM_30_HF
## # ℹ 6 more variables: PaymentCategory_PAYM_30_PN <chr>,
## # PaymentCategory_PAYM_90_HIP_KNEE <chr>, Payment_PAYM_30_AMI <chr>,
## # Payment_PAYM_30_HF <chr>, Payment_PAYM_30_PN <chr>,
## # Payment_PAYM_90_HIP_KNEE <chr>
HipKneeCleanTest <- readmissionsClean %>%
full_join(HCAHPSClean, by = "FacilityId") %>%
full_join(careClean, by = "FacilityId") %>%
full_join(deathsClean, by = "FacilityId") %>%
full_join(paymentClean, by = "FacilityId")
head(HipKneeCleanTest)
## # A tibble: 6 × 451
## FacilityName.x FacilityId State.x NumberOfDischarges_HIP-KN…¹
## <chr> <chr> <chr> <dbl>
## 1 SOUTHEAST HEALTH MEDICAL CENTER 010001 AL NA
## 2 MARSHALL MEDICAL CENTERS 010005 AL NA
## 3 NORTH ALABAMA MEDICAL CENTER 010006 AL NA
## 4 MIZELL MEMORIAL HOSPITAL 010007 AL NA
## 5 CRENSHAW COMMUNITY HOSPITAL 010008 AL NA
## 6 ST. VINCENT'S EAST 010011 AL NA
## # ℹ abbreviated name: ¹`NumberOfDischarges_HIP-KNEE`
## # ℹ 447 more variables: `ExcessReadmissionRatio_HIP-KNEE` <dbl>,
## # `PredictedReadmissionRate_HIP-KNEE` <dbl>,
## # `ExpectedReadmissionRate_HIP-KNEE` <dbl>,
## # `NumberOfReadmissions_HIP-KNEE` <dbl>, FacilityName.y <chr>, State.y <chr>,
## # PatientSurveyStarRating_H_COMP_1_A_P <chr>,
## # PatientSurveyStarRating_H_COMP_1_SN_P <chr>, …
# Removing duplicate columns
HipKneeCleanTest <- HipKneeCleanTest %>%
select(-matches("\\.(x|y|z|w|v)$"))
# Checking the dimensions
dim(HipKneeCleanTest)
# Count NA values in each column
na_counts <- sapply(HipKneeCleanTest, function(x) sum(is.na(x)))
# View the NA counts
print(na_counts)
# Calculate the percentage of NA values for each column
na_percentage <- sapply(HipKneeCleanTest, function(x) mean(is.na(x)))
# Remove columns where more than 80% of the values are NA
HipKneeCleanTest <- HipKneeCleanTest[, na_percentage <= 0.8]
# Count NA values in each column
na_counts <- sapply(HipKneeCleanTest, function(x) sum(is.na(x)))
# View the NA counts
print(na_counts)
# Check the dimensions
dim(HipKneeCleanTest)
# Remove columns containing 'AnswerPercent' or 'SurveyResponseRate'
HipKneeCleanTest <- HipKneeCleanTest %>%
select(-matches("AnswerPercent|SurveyResponseRate"))
# Check the dimensions
dim(HipKneeCleanTest)
## [1] 4816 87
# Remove columns containing 'ComparedToNational' and 'PaymentCategory'
HipKneeCleanTest <- HipKneeCleanTest %>%
select(-matches("ComparedToNational|PaymentCategory"))
# Check the dimensions
dim(HipKneeCleanTest)
## [1] 4816 67
str(HipKneeCleanTest)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
## $ FacilityId : chr [1:4816] "010001" "010005" "010006" "010007" ...
## $ ExcessReadmissionRatio_HIP-KNEE : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
## $ PredictedReadmissionRate_HIP-KNEE : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
## $ ExpectedReadmissionRate_HIP-KNEE : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
## $ NumberOfReadmissions_HIP-KNEE : num [1:4816] 3 1 7 5 NA 9 2 10 NA 1 ...
## $ PatientSurveyStarRating_H_COMP_1_STAR_RATING : chr [1:4816] "3" "3" "2" "3" ...
## $ PatientSurveyStarRating_H_COMP_2_STAR_RATING : chr [1:4816] "4" "4" "3" "5" ...
## $ PatientSurveyStarRating_H_COMP_3_STAR_RATING : chr [1:4816] "3" "2" "2" "4" ...
## $ PatientSurveyStarRating_H_COMP_5_STAR_RATING : chr [1:4816] "3" "3" "2" "3" ...
## $ PatientSurveyStarRating_H_COMP_6_STAR_RATING : chr [1:4816] "4" "3" "3" "4" ...
## $ PatientSurveyStarRating_H_COMP_7_STAR_RATING : chr [1:4816] "4" "3" "2" "4" ...
## $ PatientSurveyStarRating_H_CLEAN_STAR_RATING : chr [1:4816] "3" "2" "1" "2" ...
## $ PatientSurveyStarRating_H_QUIET_STAR_RATING : chr [1:4816] "4" "4" "4" "4" ...
## $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: chr [1:4816] "4" "3" "2" "4" ...
## $ PatientSurveyStarRating_H_RECMND_STAR_RATING : chr [1:4816] "4" "3" "2" "4" ...
## $ PatientSurveyStarRating_H_STAR_RATING : chr [1:4816] "4" "3" "2" "4" ...
## $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE : chr [1:4816] "89" "90" "88" "91" ...
## $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE : chr [1:4816] "91" "92" "89" "95" ...
## $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE : chr [1:4816] "81" "75" "75" "88" ...
## $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE : chr [1:4816] "77" "76" "71" "77" ...
## $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE : chr [1:4816] "87" "86" "83" "87" ...
## $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE : chr [1:4816] "82" "79" "77" "82" ...
## $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE : chr [1:4816] "84" "80" "74" "80" ...
## $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE : chr [1:4816] "86" "85" "85" "87" ...
## $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : chr [1:4816] "89" "85" "82" "89" ...
## $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE : chr [1:4816] "90" "83" "79" "88" ...
## $ EDV : chr [1:4816] "high" "high" "high" "low" ...
## $ ED_2_Strata_1 : chr [1:4816] NA "148" NA NA ...
## $ HCP_COVID_19 : chr [1:4816] "80.7" "79.8" "79" "57.9" ...
## $ IMM_3 : chr [1:4816] "95" "80" "67" "53" ...
## $ OP_18b : chr [1:4816] "215" "147" "177" "130" ...
## $ OP_18c : chr [1:4816] "317" "266" NA "216" ...
## $ OP_22 : chr [1:4816] "5" "3" "1" "4" ...
## $ OP_23 : chr [1:4816] NA NA "69" NA ...
## $ OP_29 : chr [1:4816] "47" "96" "85" "23" ...
## $ SAFE_USE_OF_OPIOIDS : chr [1:4816] "14" "19" "17" NA ...
## $ SEP_1 : chr [1:4816] "66" "74" "56" "86" ...
## $ SEP_SH_3HR : chr [1:4816] "70" "88" "77" NA ...
## $ SEP_SH_6HR : chr [1:4816] "100" "91" "81" NA ...
## $ SEV_SEP_3HR : chr [1:4816] "79" "88" "78" "89" ...
## $ SEV_SEP_6HR : chr [1:4816] "95" "96" "86" "97" ...
## $ STK_02 : chr [1:4816] "98" "100" "96" NA ...
## $ STK_05 : chr [1:4816] NA "91" NA NA ...
## $ STK_06 : chr [1:4816] NA NA "97" NA ...
## $ VTE_1 : chr [1:4816] "98" NA NA NA ...
## $ VTE_2 : chr [1:4816] "99" NA "97" NA ...
## $ Score_COMP_HIP_KNEE : chr [1:4816] "2.7" "2.3" "4.6" NA ...
## $ Score_MORT_30_AMI : chr [1:4816] "12" "13.6" "16.5" NA ...
## $ Score_MORT_30_COPD : chr [1:4816] "8.8" "9.9" "9.9" "13.7" ...
## $ Score_MORT_30_HF : chr [1:4816] "8.9" "14.9" "12.5" "12.5" ...
## $ Score_MORT_30_PN : chr [1:4816] "18" "23.3" "19.5" "28.5" ...
## $ Score_MORT_30_STK : chr [1:4816] "14.8" "15.3" "17.2" NA ...
## $ Score_PSI_03 : chr [1:4816] "0.39" "0.94" "1.39" "0.42" ...
## $ Score_PSI_04 : chr [1:4816] "184.68" "183.49" "173.63" NA ...
## $ Score_PSI_06 : chr [1:4816] "0.23" "0.22" "0.36" "0.24" ...
## $ Score_PSI_08 : chr [1:4816] "0.10" "0.09" "0.08" "0.09" ...
## $ Score_PSI_09 : chr [1:4816] "2.39" "2.69" "5.43" "2.49" ...
## $ Score_PSI_10 : chr [1:4816] "1.14" "1.37" "1.26" "1.57" ...
## $ Score_PSI_11 : chr [1:4816] "13.83" "7.19" "7.37" "8.45" ...
## $ Score_PSI_12 : chr [1:4816] "4.49" "3.01" "3.36" "3.89" ...
## $ Score_PSI_13 : chr [1:4816] "8.05" "4.46" "4.37" "5.19" ...
## $ Score_PSI_14 : chr [1:4816] "1.69" "1.87" "1.76" NA ...
## $ Score_PSI_15 : chr [1:4816] "0.93" "0.91" "1.34" "1.08" ...
## $ Score_PSI_90 : chr [1:4816] "1.21" "0.97" "1.17" "0.95" ...
## $ FacilityName : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
## $ State : chr [1:4816] "AL" "AL" "AL" "AL" ...
## $ Payment_PAYM_90_HIP_KNEE : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...
# Convert columns to numeric
HipKneeCleanTest <- HipKneeCleanTest %>%
mutate_at(vars(starts_with("PatientSurveyStarRating_"),
starts_with("HcahpsLinearMeanValue_"),
starts_with("Score_"),
starts_with("ED_"),
starts_with("IMM_"),
starts_with("OP_"),
starts_with("SEP_"),
starts_with("SEV_"),
starts_with("STK_"),
starts_with("VTE_"),
starts_with("SAFE_"),
starts_with("HCP_")),
~ as.numeric(as.character(.)))
# View the structure
str(HipKneeCleanTest)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
## $ FacilityId : chr [1:4816] "010001" "010005" "010006" "010007" ...
## $ ExcessReadmissionRatio_HIP-KNEE : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
## $ PredictedReadmissionRate_HIP-KNEE : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
## $ ExpectedReadmissionRate_HIP-KNEE : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
## $ NumberOfReadmissions_HIP-KNEE : num [1:4816] 3 1 7 5 NA 9 2 10 NA 1 ...
## $ PatientSurveyStarRating_H_COMP_1_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_2_STAR_RATING : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_3_STAR_RATING : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_5_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_6_STAR_RATING : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
## $ PatientSurveyStarRating_H_COMP_7_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_CLEAN_STAR_RATING : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
## $ PatientSurveyStarRating_H_QUIET_STAR_RATING : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
## $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_RECMND_STAR_RATING : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
## $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
## $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
## $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
## $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
## $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
## $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
## $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
## $ EDV : chr [1:4816] "high" "high" "high" "low" ...
## $ ED_2_Strata_1 : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
## $ HCP_COVID_19 : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
## $ IMM_3 : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
## $ OP_18b : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
## $ OP_18c : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
## $ OP_22 : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
## $ OP_23 : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
## $ OP_29 : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
## $ SAFE_USE_OF_OPIOIDS : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
## $ SEP_1 : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
## $ SEP_SH_3HR : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
## $ SEP_SH_6HR : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
## $ SEV_SEP_3HR : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
## $ SEV_SEP_6HR : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
## $ STK_02 : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
## $ STK_05 : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
## $ STK_06 : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
## $ VTE_1 : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
## $ VTE_2 : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
## $ Score_COMP_HIP_KNEE : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
## $ Score_MORT_30_AMI : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
## $ Score_MORT_30_COPD : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
## $ Score_MORT_30_HF : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
## $ Score_MORT_30_PN : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
## $ Score_MORT_30_STK : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
## $ Score_PSI_03 : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
## $ Score_PSI_04 : num [1:4816] 185 183 174 NA NA ...
## $ Score_PSI_06 : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
## $ Score_PSI_08 : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
## $ Score_PSI_09 : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
## $ Score_PSI_10 : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
## $ Score_PSI_11 : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
## $ Score_PSI_12 : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
## $ Score_PSI_13 : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
## $ Score_PSI_14 : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
## $ Score_PSI_15 : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
## $ Score_PSI_90 : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
## $ FacilityName : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
## $ State : chr [1:4816] "AL" "AL" "AL" "AL" ...
## $ Payment_PAYM_90_HIP_KNEE : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...
# Remove $ and , and convert to numeric
HipKneeCleanTest <- HipKneeCleanTest %>%
mutate_at(vars(starts_with("Payment_")),
~ as.numeric(gsub("[\\$,]", "", .)))
# Checking the structure
str(HipKneeCleanTest)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
## $ FacilityId : chr [1:4816] "010001" "010005" "010006" "010007" ...
## $ ExcessReadmissionRatio_HIP-KNEE : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
## $ PredictedReadmissionRate_HIP-KNEE : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
## $ ExpectedReadmissionRate_HIP-KNEE : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
## $ NumberOfReadmissions_HIP-KNEE : num [1:4816] 3 1 7 5 NA 9 2 10 NA 1 ...
## $ PatientSurveyStarRating_H_COMP_1_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_2_STAR_RATING : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_3_STAR_RATING : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_5_STAR_RATING : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
## $ PatientSurveyStarRating_H_COMP_6_STAR_RATING : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
## $ PatientSurveyStarRating_H_COMP_7_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ PatientSurveyStarRating_H_CLEAN_STAR_RATING : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
## $ PatientSurveyStarRating_H_QUIET_STAR_RATING : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
## $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_RECMND_STAR_RATING : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
## $ PatientSurveyStarRating_H_STAR_RATING : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
## $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
## $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
## $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
## $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
## $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
## $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
## $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
## $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
## $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
## $ EDV : chr [1:4816] "high" "high" "high" "low" ...
## $ ED_2_Strata_1 : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
## $ HCP_COVID_19 : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
## $ IMM_3 : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
## $ OP_18b : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
## $ OP_18c : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
## $ OP_22 : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
## $ OP_23 : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
## $ OP_29 : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
## $ SAFE_USE_OF_OPIOIDS : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
## $ SEP_1 : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
## $ SEP_SH_3HR : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
## $ SEP_SH_6HR : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
## $ SEV_SEP_3HR : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
## $ SEV_SEP_6HR : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
## $ STK_02 : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
## $ STK_05 : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
## $ STK_06 : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
## $ VTE_1 : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
## $ VTE_2 : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
## $ Score_COMP_HIP_KNEE : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
## $ Score_MORT_30_AMI : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
## $ Score_MORT_30_COPD : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
## $ Score_MORT_30_HF : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
## $ Score_MORT_30_PN : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
## $ Score_MORT_30_STK : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
## $ Score_PSI_03 : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
## $ Score_PSI_04 : num [1:4816] 185 183 174 NA NA ...
## $ Score_PSI_06 : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
## $ Score_PSI_08 : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
## $ Score_PSI_09 : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
## $ Score_PSI_10 : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
## $ Score_PSI_11 : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
## $ Score_PSI_12 : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
## $ Score_PSI_13 : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
## $ Score_PSI_14 : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
## $ Score_PSI_15 : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
## $ Score_PSI_90 : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
## $ FacilityName : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
## $ State : chr [1:4816] "AL" "AL" "AL" "AL" ...
## $ Payment_PAYM_90_HIP_KNEE : num [1:4816] 22212 18030 21898 NA NA ...
# Create function to find categorical variables
is_categorical <- function(x) is.factor(x) | is.character(x)
# Apply function to all variables in the dataset
categorical_vars <- sapply(HipKneeClean, is_categorical)
# Print the names of all categorical variables
categorical <- names(HipKneeClean)[categorical_vars]
categorical
## [1] "FacilityId" "EDV" "FacilityName" "State"
# Define the encoding mapping (ignore NAs for now)
encoding_map <- c(
'low' = 1,
'medium' = 2,
'high' = 3,
'very high' = 4
)
# Dummy encoding used due to ordinal nature of this data
# Create a copy of HipKneeCleanTest and name it HipKneeTest to separate cleaned dataset and the test dataset
HipKneeTest <- HipKneeCleanTest %>%
mutate(EDV = recode(EDV, !!!encoding_map))
# Print first 20 rows of EDV column in HipKneeClean and HipKneeTrain to ensure proper encoding
cat("HipKneeCleanTest")
## HipKneeCleanTest
print(head(HipKneeCleanTest$EDV, 20))
## [1] "high" "high" "high" "low" "low" "high"
## [7] "low" "medium" "low" "medium" "low" "low"
## [13] "high" "high" "very high" "very high" "low" "high"
## [19] "low" "very high"
cat("HipKneeTest")
## HipKneeTest
print(head(HipKneeTest$EDV, 20))
## [1] 3 3 3 1 1 3 1 2 1 2 1 1 3 3 4 4 1 3 1 4
# Manually map out each state with their respective code in alphabetical order with a preceding 0 to make data non-ordinal
state_mapping <- c(
"AL" = "001",
"AK" = "002",
"AZ" = "003",
"AR" = "004",
"CA" = "005",
"CO" = "006",
"CT" = "007",
"DE" = "008",
"FL" = "009",
"GA" = "010",
"HI" = "011",
"ID" = "012",
"IL" = "013",
"IN" = "014",
"IA" = "015",
"KS" = "016",
"KY" = "017",
"LA" = "018",
"ME" = "019",
"MD" = "020",
"MA" = "021",
"MI" = "022",
"MN" = "023",
"MS" = "024",
"MO" = "025",
"MT" = "026",
"NE" = "027",
"NV" = "028",
"NH" = "029",
"NJ" = "030",
"NM" = "031",
"NY" = "032",
"NC" = "033",
"ND" = "034",
"OH" = "035",
"OK" = "036",
"OR" = "037",
"PA" = "038",
"RI" = "039",
"SC" = "040",
"SD" = "041",
"TN" = "042",
"TX" = "043",
"UT" = "044",
"VT" = "045",
"VA" = "046",
"WA" = "047",
"WV" = "048",
"WI" = "049",
"WY" = "050"
)
# Create new "StateCode" column with the encoded values
HipKneeTest <- HipKneeTest %>%
mutate(StateCode = state_mapping[State])
# Print 100 rows of the "State" and "StateCode" columns to ensure accuracy
print("State and StateCode Columns")
## [1] "State and StateCode Columns"
print(head(HipKneeTest[c("State", "StateCode")], 100))
## # A tibble: 100 × 2
## State StateCode
## <chr> <chr>
## 1 AL 001
## 2 AL 001
## 3 AL 001
## 4 AL 001
## 5 AL 001
## 6 AL 001
## 7 AL 001
## 8 AL 001
## 9 AL 001
## 10 AL 001
## # ℹ 90 more rows
# Print all unique values in "StateCode" column to ensure accuracy
print("Unique StateCode Values")
## [1] "Unique StateCode Values"
print(unique(HipKneeTest$StateCode))
## [1] "001" "002" "003" "004" "005" "006" "007" "008" NA "009" "010" "011"
## [13] "012" "013" "014" "015" "016" "017" "018" "019" "020" "021" "022" "023"
## [25] "024" "025" "026" "027" "028" "029" "030" "031" "032" "033" "034" "035"
## [37] "036" "037" "038" "039" "040" "041" "042" "043" "044" "045" "046" "047"
## [49] "048" "049" "050"
# Compute correlation matrix
cor_matrix <- cor(HipKneeTest %>% select_if(is.numeric), use = "pairwise.complete.obs")
# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)
# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), name = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")
# Convert correlation matrix to df
cor_table <- as.data.frame(cor_matrix)
# Add variable names as a column
cor_table$Variable <- rownames(cor_table)
# Reorder columns
cor_table <- cor_table %>%
select(Variable, everything())
# Print table
cor_table %>%
kable(caption = "Table 8. Correlation Coefficients Table") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | PredictedReadmissionRate_HIP-KNEE | HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | EDV | HCP_COVID_19 | IMM_3 | OP_18b | OP_29 | SAFE_USE_OF_OPIOIDS | VTE_1 | Score_COMP_HIP_KNEE | Score_MORT_30_AMI | Score_MORT_30_COPD | Score_MORT_30_HF | Score_MORT_30_PN | Score_MORT_30_STK | Score_PSI_03 | Score_PSI_04 | Score_PSI_06 | Score_PSI_08 | Score_PSI_09 | Score_PSI_10 | Score_PSI_11 | Score_PSI_12 | Score_PSI_13 | Score_PSI_14 | Score_PSI_15 | Payment_PAYM_90_HIP_KNEE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PredictedReadmissionRate_HIP-KNEE | PredictedReadmissionRate_HIP-KNEE | 1.0000000 | -0.2060912 | 0.1986939 | -0.0563082 | -0.0028840 | 0.1295727 | -0.0106510 | 0.1063002 | 0.0654668 | 0.3208550 | 0.0074065 | -0.0794948 | -0.1067828 | -0.0985660 | -0.0376746 | -0.0037334 | -0.0449077 | 0.0154891 | -0.0214412 | -0.0182303 | 0.0710046 | 0.1130121 | 0.1047402 | 0.1193336 | 0.0140012 | -0.0158282 | 0.2975679 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | -0.2060912 | 1.0000000 | -0.2262341 | 0.0302154 | 0.2182505 | -0.2448842 | 0.0865028 | 0.1100002 | -0.0248375 | -0.1091775 | -0.0952111 | -0.0230632 | 0.0300940 | -0.0702915 | -0.0760295 | -0.0499063 | -0.0948401 | -0.0054742 | -0.0590998 | 0.0707774 | -0.0157615 | -0.1615833 | -0.0674588 | -0.1458284 | -0.0172231 | 0.0304634 | -0.2108956 |
| EDV | EDV | 0.1986939 | -0.2262341 | 1.0000000 | 0.1599806 | 0.0038674 | 0.5918897 | 0.0603877 | -0.1223889 | 0.2992859 | -0.0240093 | -0.0687401 | -0.0840621 | -0.2588739 | -0.1904351 | -0.0754281 | 0.0292819 | -0.0438108 | -0.0308989 | -0.1637017 | -0.0181523 | 0.0837980 | 0.0455190 | 0.0749218 | 0.0724936 | 0.0399935 | 0.0144400 | 0.0053946 |
| HCP_COVID_19 | HCP_COVID_19 | -0.0563082 | 0.0302154 | 0.1599806 | 1.0000000 | 0.3203622 | 0.2574291 | 0.1067941 | -0.0812735 | 0.0241622 | -0.0510683 | -0.0869890 | -0.1128278 | -0.1245435 | -0.1523779 | -0.0988833 | 0.0943953 | 0.0417007 | 0.0225916 | -0.0272232 | 0.0549990 | 0.0009222 | -0.0909811 | 0.1091949 | -0.0160408 | 0.0169512 | 0.0430812 | -0.0627505 |
| IMM_3 | IMM_3 | -0.0028840 | 0.2182505 | 0.0038674 | 0.3203622 | 1.0000000 | 0.1105628 | 0.1317922 | 0.0410289 | 0.0906329 | -0.0212916 | -0.0165321 | -0.0616051 | -0.0010634 | -0.0761104 | 0.0146397 | 0.0451508 | 0.0601831 | 0.0544576 | -0.0311579 | 0.0899361 | 0.0412713 | -0.0625676 | 0.0594933 | -0.0250714 | 0.0723872 | 0.0625753 | -0.0720431 |
| OP_18b | OP_18b | 0.1295727 | -0.2448842 | 0.5918897 | 0.2574291 | 0.1105628 | 1.0000000 | 0.0506067 | -0.1400845 | 0.2344307 | -0.0293698 | -0.0678837 | -0.1527290 | -0.2195933 | -0.1858291 | -0.0905644 | 0.0583644 | 0.0638412 | 0.0187794 | -0.0806516 | 0.0224833 | 0.0544528 | -0.0076797 | 0.1653812 | 0.0993570 | 0.0676972 | 0.0374530 | -0.0241965 |
| OP_29 | OP_29 | -0.0106510 | 0.0865028 | 0.0603877 | 0.1067941 | 0.1317922 | 0.0506067 | 1.0000000 | -0.0650231 | 0.1567526 | -0.0096464 | -0.0600569 | 0.0081252 | 0.0099705 | -0.0536184 | -0.0251654 | -0.0032584 | 0.0312780 | 0.0059688 | -0.0199006 | 0.0150699 | 0.0331569 | -0.0837910 | -0.0108546 | -0.0087678 | 0.0163779 | 0.0419068 | -0.0815209 |
| SAFE_USE_OF_OPIOIDS | SAFE_USE_OF_OPIOIDS | 0.1063002 | 0.1100002 | -0.1223889 | -0.0812735 | 0.0410289 | -0.1400845 | -0.0650231 | 1.0000000 | -0.0563373 | -0.0081923 | -0.0643353 | -0.0362573 | 0.0171605 | -0.0204107 | -0.0804707 | -0.0287158 | -0.0995591 | -0.0603629 | -0.0017558 | -0.0043396 | -0.0382369 | 0.0137283 | -0.0380145 | -0.0258097 | -0.0166146 | -0.0268805 | -0.0048449 |
| VTE_1 | VTE_1 | 0.0654668 | -0.0248375 | 0.2992859 | 0.0241622 | 0.0906329 | 0.2344307 | 0.1567526 | -0.0563373 | 1.0000000 | -0.0526925 | -0.0493931 | -0.0282911 | -0.1051171 | -0.1710235 | -0.1238916 | -0.0378500 | -0.1180160 | -0.0262166 | -0.0522876 | -0.0577213 | -0.0037803 | -0.0301775 | -0.0317534 | -0.0440578 | 0.0086141 | 0.0326658 | -0.1256363 |
| Score_COMP_HIP_KNEE | Score_COMP_HIP_KNEE | 0.3208550 | -0.1091775 | -0.0240093 | -0.0510683 | -0.0212916 | -0.0293698 | -0.0096464 | -0.0081923 | -0.0526925 | 1.0000000 | 0.0830479 | -0.0203930 | -0.0007242 | 0.0241066 | 0.0211621 | 0.0498557 | 0.0038509 | 0.0505415 | 0.0577776 | 0.0540124 | 0.0813038 | 0.1279724 | 0.1458258 | 0.1334619 | 0.0498603 | 0.0433809 | 0.3410864 |
| Score_MORT_30_AMI | Score_MORT_30_AMI | 0.0074065 | -0.0952111 | -0.0687401 | -0.0869890 | -0.0165321 | -0.0678837 | -0.0600569 | -0.0643353 | -0.0493931 | 0.0830479 | 1.0000000 | 0.2498600 | 0.3407616 | 0.3309425 | 0.2222539 | 0.0415523 | 0.2105379 | 0.0885083 | 0.1010348 | 0.0889343 | 0.1066619 | 0.1037006 | 0.0492328 | 0.0467554 | 0.0454462 | 0.0297688 | 0.0591548 |
| Score_MORT_30_COPD | Score_MORT_30_COPD | -0.0794948 | -0.0230632 | -0.0840621 | -0.1128278 | -0.0616051 | -0.1527290 | 0.0081252 | -0.0362573 | -0.0282911 | -0.0203930 | 0.2498600 | 1.0000000 | 0.3844105 | 0.3710744 | 0.2038243 | -0.0069743 | 0.1713379 | 0.0478268 | 0.0397571 | 0.0429090 | 0.0320669 | 0.0426574 | -0.0532586 | 0.0026944 | 0.0734846 | 0.0340007 | -0.0406696 |
| Score_MORT_30_HF | Score_MORT_30_HF | -0.1067828 | 0.0300940 | -0.2588739 | -0.1245435 | -0.0010634 | -0.2195933 | 0.0099705 | 0.0171605 | -0.1051171 | -0.0007242 | 0.3407616 | 0.3844105 | 1.0000000 | 0.4479367 | 0.3147371 | 0.0371596 | 0.2556384 | 0.0679149 | 0.1051698 | 0.0707269 | 0.0383771 | 0.0362529 | -0.0300702 | -0.0086832 | 0.0647245 | 0.0342374 | -0.0350247 |
| Score_MORT_30_PN | Score_MORT_30_PN | -0.0985660 | -0.0702915 | -0.1904351 | -0.1523779 | -0.0761104 | -0.1858291 | -0.0536184 | -0.0204107 | -0.1710235 | 0.0241066 | 0.3309425 | 0.3710744 | 0.4479367 | 1.0000000 | 0.3042563 | 0.0303815 | 0.2301195 | 0.0543554 | 0.0884315 | 0.0217880 | 0.0237048 | 0.0704445 | 0.0089560 | 0.0393676 | 0.0464407 | 0.0029691 | -0.0062985 |
| Score_MORT_30_STK | Score_MORT_30_STK | -0.0376746 | -0.0760295 | -0.0754281 | -0.0988833 | 0.0146397 | -0.0905644 | -0.0251654 | -0.0804707 | -0.1238916 | 0.0211621 | 0.2222539 | 0.2038243 | 0.3147371 | 0.3042563 | 1.0000000 | 0.0687216 | 0.2380935 | 0.0878847 | 0.1014879 | 0.0674377 | 0.0622532 | 0.0725381 | 0.0474896 | 0.0513975 | 0.0492194 | 0.0625191 | -0.0272101 |
| Score_PSI_03 | Score_PSI_03 | -0.0037334 | -0.0499063 | 0.0292819 | 0.0943953 | 0.0451508 | 0.0583644 | -0.0032584 | -0.0287158 | -0.0378500 | 0.0498557 | 0.0415523 | -0.0069743 | 0.0371596 | 0.0303815 | 0.0687216 | 1.0000000 | 0.1353085 | 0.0601750 | 0.0636661 | 0.1407342 | 0.0386211 | 0.0114365 | 0.1186788 | 0.0298580 | 0.0596798 | 0.0999683 | 0.0086745 |
| Score_PSI_04 | Score_PSI_04 | -0.0449077 | -0.0948401 | -0.0438108 | 0.0417007 | 0.0601831 | 0.0638412 | 0.0312780 | -0.0995591 | -0.1180160 | 0.0038509 | 0.2105379 | 0.1713379 | 0.2556384 | 0.2301195 | 0.2380935 | 0.1353085 | 1.0000000 | 0.0601419 | 0.0870693 | 0.1059485 | 0.0523892 | 0.0649032 | 0.0782559 | 0.0123489 | 0.0652098 | 0.1018205 | -0.0766302 |
| Score_PSI_06 | Score_PSI_06 | 0.0154891 | -0.0054742 | -0.0308989 | 0.0225916 | 0.0544576 | 0.0187794 | 0.0059688 | -0.0603629 | -0.0262166 | 0.0505415 | 0.0885083 | 0.0478268 | 0.0679149 | 0.0543554 | 0.0878847 | 0.0601750 | 0.0601419 | 1.0000000 | 0.0724291 | 0.1014588 | 0.0516246 | 0.0351464 | 0.1431056 | 0.0509831 | 0.0527115 | 0.0910520 | 0.0456525 |
| Score_PSI_08 | Score_PSI_08 | -0.0214412 | -0.0590998 | -0.1637017 | -0.0272232 | -0.0311579 | -0.0806516 | -0.0199006 | -0.0017558 | -0.0522876 | 0.0577776 | 0.1010348 | 0.0397571 | 0.1051698 | 0.0884315 | 0.1014879 | 0.0636661 | 0.0870693 | 0.0724291 | 1.0000000 | 0.0052449 | -0.0360093 | 0.0198090 | 0.0394605 | 0.0093444 | 0.0228045 | 0.0127268 | -0.0041983 |
| Score_PSI_09 | Score_PSI_09 | -0.0182303 | 0.0707774 | -0.0181523 | 0.0549990 | 0.0899361 | 0.0224833 | 0.0150699 | -0.0043396 | -0.0577213 | 0.0540124 | 0.0889343 | 0.0429090 | 0.0707269 | 0.0217880 | 0.0674377 | 0.1407342 | 0.1059485 | 0.1014588 | 0.0052449 | 1.0000000 | 0.0885278 | 0.0680540 | 0.1732337 | 0.0519119 | 0.1207438 | 0.2197254 | -0.0237660 |
| Score_PSI_10 | Score_PSI_10 | 0.0710046 | -0.0157615 | 0.0837980 | 0.0009222 | 0.0412713 | 0.0544528 | 0.0331569 | -0.0382369 | -0.0037803 | 0.0813038 | 0.1066619 | 0.0320669 | 0.0383771 | 0.0237048 | 0.0622532 | 0.0386211 | 0.0523892 | 0.0516246 | -0.0360093 | 0.0885278 | 1.0000000 | 0.1626632 | 0.1079488 | 0.2303938 | 0.0453739 | 0.0830134 | 0.0497447 |
| Score_PSI_11 | Score_PSI_11 | 0.1130121 | -0.1615833 | 0.0455190 | -0.0909811 | -0.0625676 | -0.0076797 | -0.0837910 | 0.0137283 | -0.0301775 | 0.1279724 | 0.1037006 | 0.0426574 | 0.0362529 | 0.0704445 | 0.0725381 | 0.0114365 | 0.0649032 | 0.0351464 | 0.0198090 | 0.0680540 | 0.1626632 | 1.0000000 | 0.1172504 | 0.2506376 | -0.0093577 | 0.0464067 | 0.1441986 |
| Score_PSI_12 | Score_PSI_12 | 0.1047402 | -0.0674588 | 0.0749218 | 0.1091949 | 0.0594933 | 0.1653812 | -0.0108546 | -0.0380145 | -0.0317534 | 0.1458258 | 0.0492328 | -0.0532586 | -0.0300702 | 0.0089560 | 0.0474896 | 0.1186788 | 0.0782559 | 0.1431056 | 0.0394605 | 0.1732337 | 0.1079488 | 0.1172504 | 1.0000000 | 0.1742084 | 0.0522204 | 0.1358951 | 0.0655557 |
| Score_PSI_13 | Score_PSI_13 | 0.1193336 | -0.1458284 | 0.0724936 | -0.0160408 | -0.0250714 | 0.0993570 | -0.0087678 | -0.0258097 | -0.0440578 | 0.1334619 | 0.0467554 | 0.0026944 | -0.0086832 | 0.0393676 | 0.0513975 | 0.0298580 | 0.0123489 | 0.0509831 | 0.0093444 | 0.0519119 | 0.2303938 | 0.2506376 | 0.1742084 | 1.0000000 | 0.0056987 | 0.0878105 | 0.0949467 |
| Score_PSI_14 | Score_PSI_14 | 0.0140012 | -0.0172231 | 0.0399935 | 0.0169512 | 0.0723872 | 0.0676972 | 0.0163779 | -0.0166146 | 0.0086141 | 0.0498603 | 0.0454462 | 0.0734846 | 0.0647245 | 0.0464407 | 0.0492194 | 0.0596798 | 0.0652098 | 0.0527115 | 0.0228045 | 0.1207438 | 0.0453739 | -0.0093577 | 0.0522204 | 0.0056987 | 1.0000000 | 0.1176726 | -0.0181150 |
| Score_PSI_15 | Score_PSI_15 | -0.0158282 | 0.0304634 | 0.0144400 | 0.0430812 | 0.0625753 | 0.0374530 | 0.0419068 | -0.0268805 | 0.0326658 | 0.0433809 | 0.0297688 | 0.0340007 | 0.0342374 | 0.0029691 | 0.0625191 | 0.0999683 | 0.1018205 | 0.0910520 | 0.0127268 | 0.2197254 | 0.0830134 | 0.0464067 | 0.1358951 | 0.0878105 | 0.1176726 | 1.0000000 | -0.0467071 |
| Payment_PAYM_90_HIP_KNEE | Payment_PAYM_90_HIP_KNEE | 0.2975679 | -0.2108956 | 0.0053946 | -0.0627505 | -0.0720431 | -0.0241965 | -0.0815209 | -0.0048449 | -0.1256363 | 0.3410864 | 0.0591548 | -0.0406696 | -0.0350247 | -0.0062985 | -0.0272101 | 0.0086745 | -0.0766302 | 0.0456525 | -0.0041983 | -0.0237660 | 0.0497447 | 0.1441986 | 0.0655557 | 0.0949467 | -0.0181150 | -0.0467071 | 1.0000000 |
# Change - to _ in HIP-KNEE
colnames(HipKneeTest) <- gsub("-", "_", colnames(HipKneeTest))
# Remove all NA values in target variable "PredictedReadmissionRate_HIP_KNEE"
HipKneeTest <- HipKneeTest %>% filter(!is.na(PredictedReadmissionRate_HIP_KNEE))
# Remove all NA values in the "State", "StateCode", and "FacilityName" columns
HipKneeTest <- HipKneeTest %>% drop_na(State, StateCode, FacilityName)
# Print number of remaining variables and observations
dimensions <- dim(HipKneeTest)
cat("Number of variables:", dimensions[2], "\n")
## Number of variables: 31
cat("Number of observations:", dimensions[1], "\n")
## Number of observations: 1833
# Calculate missing values
missing_values_summary <- HipKneeTest %>%
summarise(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTest)) * 100)
# Print table
missing_values_summary %>%
kable(caption = "Table 7. Missing Values Summary") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | Missing_Count | Missing_Percentage |
|---|---|---|
| FacilityId | 0 | 0.0000000 |
| PredictedReadmissionRate_HIP_KNEE | 0 | 0.0000000 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | 33 | 1.8003273 |
| EDV | 90 | 4.9099836 |
| HCP_COVID_19 | 16 | 0.8728860 |
| IMM_3 | 16 | 0.8728860 |
| OP_18b | 75 | 4.0916530 |
| OP_29 | 222 | 12.1112930 |
| SAFE_USE_OF_OPIOIDS | 69 | 3.7643208 |
| VTE_1 | 994 | 54.2280415 |
| Score_COMP_HIP_KNEE | 40 | 2.1822149 |
| Score_MORT_30_AMI | 405 | 22.0949264 |
| Score_MORT_30_COPD | 247 | 13.4751773 |
| Score_MORT_30_HF | 141 | 7.6923077 |
| Score_MORT_30_PN | 125 | 6.8194217 |
| Score_MORT_30_STK | 284 | 15.4937261 |
| Score_PSI_03 | 8 | 0.4364430 |
| Score_PSI_04 | 575 | 31.3693399 |
| Score_PSI_06 | 2 | 0.1091107 |
| Score_PSI_08 | 2 | 0.1091107 |
| Score_PSI_09 | 2 | 0.1091107 |
| Score_PSI_10 | 41 | 2.2367703 |
| Score_PSI_11 | 40 | 2.1822149 |
| Score_PSI_12 | 2 | 0.1091107 |
| Score_PSI_13 | 42 | 2.2913257 |
| Score_PSI_14 | 87 | 4.7463175 |
| Score_PSI_15 | 29 | 1.5821058 |
| FacilityName | 0 | 0.0000000 |
| State | 0 | 0.0000000 |
| Payment_PAYM_90_HIP_KNEE | 42 | 2.2913257 |
| StateCode | 0 | 0.0000000 |
# Calculate median for columns with <5% missing values
numeric_vars_low_missing <- c("HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "EDV", "HCP_COVID_19", "IMM_3", "OP_18b", "SAFE_USE_OF_OPIOIDS", "Score_COMP_HIP_KNEE", "Score_PSI_03", "Score_PSI_06", "Score_PSI_08", "Score_PSI_09", "Score_PSI_10", "Score_PSI_11", "Score_PSI_12", "Score_PSI_13", "Score_PSI_14", "Score_PSI_15", "Payment_PAYM_90_HIP_KNEE")
for (var in numeric_vars_low_missing) {
HipKneeTest[[var]][is.na(HipKneeTest[[var]])] <- median(HipKneeTest[[var]], na.rm = TRUE)
}
# Select high missingness variables for KNN imputation
vars_for_knn <- c("VTE_1", "Score_MORT_30_AMI", "Score_MORT_30_COPD", "Score_MORT_30_HF", "Score_MORT_30_PN", "Score_MORT_30_STK", "Score_PSI_04", "OP_29")
# Perform KNN imputation
HipKneeTest_knn <- kNN(HipKneeTest, variable = vars_for_knn, k = 5)
# Remove columns created by the KNN function
HipKneeTest_knn <- HipKneeTest_knn %>% select(-ends_with("_imp"))
# Update HipKneeTrain with imputed values
HipKneeTest[vars_for_knn] <- HipKneeTest_knn[vars_for_knn]
# Calculate missing values
missing_values_summary <- HipKneeTest %>%
summarise(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTest)) * 100)
# Print table
missing_values_summary %>%
kable(caption = "Table 7. Missing Values Summary") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
| Variable | Missing_Count | Missing_Percentage |
|---|---|---|
| FacilityId | 0 | 0 |
| PredictedReadmissionRate_HIP_KNEE | 0 | 0 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | 0 | 0 |
| EDV | 0 | 0 |
| HCP_COVID_19 | 0 | 0 |
| IMM_3 | 0 | 0 |
| OP_18b | 0 | 0 |
| OP_29 | 0 | 0 |
| SAFE_USE_OF_OPIOIDS | 0 | 0 |
| VTE_1 | 0 | 0 |
| Score_COMP_HIP_KNEE | 0 | 0 |
| Score_MORT_30_AMI | 0 | 0 |
| Score_MORT_30_COPD | 0 | 0 |
| Score_MORT_30_HF | 0 | 0 |
| Score_MORT_30_PN | 0 | 0 |
| Score_MORT_30_STK | 0 | 0 |
| Score_PSI_03 | 0 | 0 |
| Score_PSI_04 | 0 | 0 |
| Score_PSI_06 | 0 | 0 |
| Score_PSI_08 | 0 | 0 |
| Score_PSI_09 | 0 | 0 |
| Score_PSI_10 | 0 | 0 |
| Score_PSI_11 | 0 | 0 |
| Score_PSI_12 | 0 | 0 |
| Score_PSI_13 | 0 | 0 |
| Score_PSI_14 | 0 | 0 |
| Score_PSI_15 | 0 | 0 |
| FacilityName | 0 | 0 |
| State | 0 | 0 |
| Payment_PAYM_90_HIP_KNEE | 0 | 0 |
| StateCode | 0 | 0 |
# Average death rates amongst mortality variables and create new column "Score_Ovr_MORT"
HipKneeTest$Score_Ovr_MORT <- rowMeans(HipKneeTest[, c("Score_MORT_30_AMI",
"Score_MORT_30_COPD",
"Score_MORT_30_HF",
"Score_MORT_30_PN",
"Score_MORT_30_STK")],
na.rm = TRUE)
# Remove old mortality columns
HipKneeTest <- HipKneeTest[, !(names(HipKneeTest) %in% c("Score_MORT_30_AMI",
"Score_MORT_30_COPD",
"Score_MORT_30_HF",
"Score_MORT_30_PN",
"Score_MORT_30_STK"))]
# Compute correlation matrix
cor_matrix <- cor(HipKneeTest %>% select_if(is.numeric), use = "pairwise.complete.obs")
# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)
# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), name = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")
save(HipKneeTest, file = "HipKneeTest.RData")
# Create a summary table of descriptive statistics
descr_stats <- describe(HipKneeTrain)
# Remove the rows with Facility ID, State and State code, and facility name
descr_stats <- descr_stats %>% filter(vars != c(1, 23, 24, 26))
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `vars != c(1, 23, 24, 26)`.
## Caused by warning in `vars != c(1, 23, 24, 26)`:
## ! longer object length is not a multiple of shorter object length
# Remove columns 1, 2, 5, and 6
descr_stats <- descr_stats[, -c(1, 2, 5, 6)]
# Create a table with kable
kable(descr_stats, format = "html", caption = "Descriptive Statistics for All Numeric Variables in Final Dataset") %>%
kable_styling(
bootstrap_options = c("hover", "striped", "responsive")
) %>%
column_spec(1, bold = TRUE) %>%
column_spec(2, width = "5em") %>%
row_spec(0, bold = TRUE, background = "#f2f2f2")
| mean | sd | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| PredictedReadmissionRate_HIP_KNEE | 4.546561e+00 | 0.9093914 | 0.8576841 | 1.9279 | 8.569 | 6.6411 | 0.4373061 | 0.4908058 | 0.0212407 |
| HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE | 8.672450e+01 | 3.7230744 | 2.9652000 | 66.0000 | 98.000 | 32.0000 | -0.5851721 | 1.5050011 | 0.0869602 |
| EDV | 2.694490e+00 | 1.0250451 | 1.4826000 | 1.0000 | 4.000 | 3.0000 | -0.1385316 | -1.1578643 | 0.0239421 |
| HCP_COVID_19 | 8.887098e+01 | 9.5606361 | 8.3025600 | 32.6000 | 100.000 | 67.4000 | -1.3148000 | 2.1219028 | 0.2233087 |
| IMM_3 | 7.932570e+01 | 17.7478481 | 16.3086000 | 4.0000 | 100.000 | 96.0000 | -1.0958217 | 0.7241943 | 0.4145381 |
| OP_18b | 1.895739e+02 | 49.6783664 | 44.4780000 | 62.0000 | 587.000 | 525.0000 | 0.9817063 | 3.3080181 | 1.1603422 |
| OP_29 | 9.215876e+01 | 11.7440127 | 4.4478000 | 0.0000 | 100.000 | 100.0000 | -3.3057142 | 14.4641144 | 0.2743060 |
| SAFE_USE_OF_OPIOIDS | 1.557229e+01 | 4.2502679 | 2.9652000 | 0.0000 | 45.000 | 45.0000 | 0.5890847 | 3.1474800 | 0.0992739 |
| VTE_1 | 9.107365e+01 | 9.1561257 | 4.4478000 | 5.0000 | 100.000 | 95.0000 | -4.3773341 | 29.2741519 | 0.2138605 |
| Score_COMP_HIP_KNEE | 3.179924e+00 | 0.5539759 | 0.4447800 | 1.6000 | 6.200 | 4.6000 | 0.7280978 | 1.7214783 | 0.0129393 |
| Score_PSI_03 | 5.878123e-01 | 0.5183737 | 0.2816940 | 0.0500 | 6.310 | 6.2600 | 3.5027779 | 21.8351982 | 0.0121077 |
| Score_PSI_04 | 1.688142e+02 | 19.0239898 | 15.6117780 | 86.6800 | 241.810 | 155.1300 | -0.0656641 | 1.2542628 | 0.4443451 |
| Score_PSI_06 | 2.467703e-01 | 0.0452557 | 0.0296520 | 0.1200 | 0.510 | 0.3900 | 0.9085914 | 1.9551905 | 0.0010570 |
| Score_PSI_08 | 8.994540e-02 | 0.0082004 | 0.0000000 | 0.0600 | 0.130 | 0.0700 | 0.5502254 | 1.7580966 | 0.0001915 |
| Score_PSI_09 | 2.514146e+00 | 0.4953103 | 0.3409980 | 1.1000 | 6.100 | 5.0000 | 1.2648156 | 4.8211377 | 0.0115690 |
| Score_PSI_10 | 1.572951e+00 | 0.3731798 | 0.1482600 | 0.4700 | 4.550 | 4.0800 | 1.8287437 | 7.4056616 | 0.0087164 |
| Score_PSI_11 | 8.969864e+00 | 3.1123203 | 2.2239000 | 2.7300 | 49.000 | 46.2700 | 2.7479837 | 22.3393252 | 0.0726947 |
| Score_PSI_12 | 3.582755e+00 | 0.7944605 | 0.6671700 | 1.6100 | 7.510 | 5.9000 | 0.9512312 | 1.6446910 | 0.0185563 |
| Score_PSI_13 | 5.287103e+00 | 1.0312666 | 0.7413000 | 2.1700 | 10.790 | 8.6200 | 1.0438233 | 2.7498825 | 0.0240874 |
| Score_PSI_14 | 2.010540e+00 | 0.3569335 | 0.1927380 | 1.0700 | 4.400 | 3.3300 | 1.9129489 | 6.4857308 | 0.0083369 |
| Score_PSI_15 | 1.102668e+00 | 0.3271882 | 0.2223900 | 0.3500 | 3.430 | 3.0800 | 1.7127063 | 5.4101230 | 0.0076422 |
| FacilityName* | 9.026263e+02 | 517.8533090 | 667.1700000 | 1.0000 | 1796.000 | 1795.0000 | -0.0093756 | -1.2029068 | 12.0955473 |
| State* | 2.469449e+01 | 14.4448000 | 20.7564000 | 1.0000 | 50.000 | 49.0000 | 0.0222112 | -1.3657016 | 0.3373885 |
| Payment_PAYM_90_HIP_KNEE | 2.105666e+04 | 1943.7230953 | 1716.8508000 | 15936.0000 | 34916.000 | 18980.0000 | 0.7645293 | 1.9173982 | 45.3997188 |
| StateCode* | 2.472995e+01 | 14.5368326 | 20.7564000 | 1.0000 | 50.000 | 49.0000 | 0.0160810 | -1.3497989 | 0.3395381 |
| Score_Ovr_MORT | 1.305795e+01 | 1.1615024 | 1.0378200 | 8.1600 | 17.420 | 9.2600 | -0.0667113 | 0.5511229 | 0.0271293 |
# Select numeric columns
numeric_columns <- HipKneeTrain %>% select_if(is.numeric)
# Melt the data for easier plotting with ggplot2
numeric_melted <- melt(numeric_columns)
## No id variables; using all as measure variables
# Create histograms
ggplot(numeric_melted, aes(x = value)) +
geom_histogram(bins = 30, fill = "blue", color = "black") +
facet_wrap(~variable, scales = "free_x") +
theme_minimal() +
labs(title = "Histograms of Numeric Variables", x = "Value", y = "Frequency")
# Select numeric columns for clustering
numeric_columns <- HipKneeTrain %>% select_if(is.numeric)
# Standardize features
X_scaled <- scale(numeric_columns)
# Determine optimal number of clusters using elbow plot
set.seed(123)
elbow_plot <- fviz_nbclust(X_scaled, kmeans, method = "wss", k.max = 10) +
labs(title = "Elbow Plot for Optimal k")
print(elbow_plot)
# Optimal K = 3
optimal_k <- 3
kmeans_result <- kmeans(X_scaled, centers = optimal_k, nstart = 25)
# Create a new df for K-Means Clustering results
HipKneeTrain_K_Means <- HipKneeTrain %>%
mutate(Cluster = as.factor(kmeans_result$cluster))
# Visualize clusters
fviz_cluster(kmeans_result, data = X_scaled,
ellipse.type = "convex",
palette = "jco",
ggtheme = theme_minimal())
# Cluster characteristics
cluster_summary <- HipKneeTrain_K_Means %>%
group_by(Cluster) %>%
summarise_if(is.numeric, mean, na.rm = TRUE)
print(cluster_summary)
## # A tibble: 3 × 24
## Cluster PredictedReadmission…¹ HcahpsLinearMeanValu…² EDV HCP_COVID_19 IMM_3
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 4.73 85.0 2.51 80.6 63.3
## 2 2 4.19 88.2 2.53 91.8 85.8
## 3 3 4.98 85.9 3.19 92.4 84.7
## # ℹ abbreviated names: ¹PredictedReadmissionRate_HIP_KNEE,
## # ²HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## # ℹ 18 more variables: OP_18b <dbl>, OP_29 <dbl>, SAFE_USE_OF_OPIOIDS <dbl>,
## # VTE_1 <dbl>, Score_COMP_HIP_KNEE <dbl>, Score_PSI_03 <dbl>,
## # Score_PSI_04 <dbl>, Score_PSI_06 <dbl>, Score_PSI_08 <dbl>,
## # Score_PSI_09 <dbl>, Score_PSI_10 <dbl>, Score_PSI_11 <dbl>,
## # Score_PSI_12 <dbl>, Score_PSI_13 <dbl>, Score_PSI_14 <dbl>, …
# Visualize feature distributions across clusters
features_to_plot <- c("PredictedReadmissionRate_HIP_KNEE", "HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "Score_COMP_HIP_KNEE", "SAFE_USE_OF_OPIOIDS")
for (feature in features_to_plot) {
p <- ggplot(HipKneeTrain_K_Means, aes(x = Cluster, y = .data[[feature]], fill = Cluster)) +
geom_boxplot() +
theme_minimal() +
labs(title = paste("Distribution of", feature, "across clusters"))
print(p)
}
# Perform PCA
pca_result <- prcomp(X_scaled, center = TRUE, scale. = TRUE)
# Visualize variance
fviz_eig(pca_result, addlabels = TRUE)
# Factor map
fviz_pca_var(pca_result, col.var = "contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE)
# PC scores, three components
pca_scores <- as.data.frame(pca_result$x[, 1:3])
# Store PCA results in a new dataframe
HipKneeTrain_PCA <- HipKneeTrain %>%
select_if(is.numeric) %>%
bind_cols(pca_scores)
# Hierarchical Clustering
# Compute distance matrix
dist_matrix <- dist(X_scaled, method = "euclidean")
# Perform hierarchical clustering
hc_result <- hclust(dist_matrix, method = "ward.D2")
# Compute WCSS for different number of clusters
wcss <- sapply(1:10, function(k) {
clusters <- cutree(hc_result, k)
cluster_data <- scale(X_scaled)
tot.withinss <- sum(sapply(unique(clusters), function(c) {
sum(dist(cluster_data[clusters == c, , drop = FALSE])^2)
}))
return(tot.withinss)
})
# Plot WCSS
plot(1:10, wcss, type = "b", xlab = "Number of Clusters", ylab = "WCSS")
# Create clusters with optimal number of clusters from WCSS plot
k <- 3
hc_clusters <- cutree(hc_result, k = k)
# Store hierarchical clustering results in a new dataframe
HipKneeTrain_HC <- HipKneeTrain %>%
mutate(HC_Cluster = as.factor(hc_clusters))
# Visualize clusters using first three PCs
pca_plot_data <- cbind(pca_scores[, 1:3], Cluster = hc_clusters)
fviz_cluster(list(data = pca_plot_data, cluster = hc_clusters),
ellipse.type = "convex",
palette = "jco",
ggtheme = theme_minimal(),
main = "Hierarchical Clustering Results (PCA)")
# Analyze cluster characteristics
hc_cluster_summary <- HipKneeTrain_HC %>%
group_by(HC_Cluster) %>%
summarise_if(is.numeric, mean, na.rm = TRUE)
print(hc_cluster_summary)
## # A tibble: 3 × 24
## HC_Cluster PredictedReadmissionRat…¹ HcahpsLinearMeanValu…² EDV HCP_COVID_19
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1 4.82 85.3 2.51 85.1
## 2 2 4.24 88.0 2.62 91.0
## 3 3 4.72 86.5 3.22 91.4
## # ℹ abbreviated names: ¹PredictedReadmissionRate_HIP_KNEE,
## # ²HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## # ℹ 19 more variables: IMM_3 <dbl>, OP_18b <dbl>, OP_29 <dbl>,
## # SAFE_USE_OF_OPIOIDS <dbl>, VTE_1 <dbl>, Score_COMP_HIP_KNEE <dbl>,
## # Score_PSI_03 <dbl>, Score_PSI_04 <dbl>, Score_PSI_06 <dbl>,
## # Score_PSI_08 <dbl>, Score_PSI_09 <dbl>, Score_PSI_10 <dbl>,
## # Score_PSI_11 <dbl>, Score_PSI_12 <dbl>, Score_PSI_13 <dbl>, …
# Visualize feature distributions across clusters
features_to_plot <- c("PredictedReadmissionRate_HIP_KNEE", "HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "Score_COMP_HIP_KNEE", "SAFE_USE_OF_OPIOIDS")
for (feature in features_to_plot) {
p <- ggplot(HipKneeTrain_HC, aes_string(x = "HC_Cluster", y = feature, fill = "HC_Cluster")) +
geom_boxplot() +
theme_minimal() +
labs(title = paste("Distribution of", feature, "across Hierarchical Clusters"))
print(p)
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
> This is a preliminary segmentation analysis and we would go forth
and tighten/tidy this up some more. However, just initial impression,
I’m not sure clustering is entirely beneficial with our dataset. What
are your thoughts on our preliminary segmentation analysis? Any ideas in
which we could improve our clustering to perhaps be more meaningful?
# Remove unwanted columns from the dataset
HipKneeTrain_RF <- HipKneeTrain %>%
select(-State, -FacilityName, -FacilityId)
# Define mtry parameter grid
grid <- expand.grid(
mtry = c(2, 4, 6, 8)
)
# Define CV
train_control <- trainControl(
method = "cv",
number = 7,
verboseIter = TRUE
)
# Train the Random Forest model with grid search
rf_grid_search <- train(
PredictedReadmissionRate_HIP_KNEE ~ .,
data = HipKneeTrain_RF,
method = "rf",
trControl = train_control,
tuneGrid = grid,
importance = TRUE,
ntree = 100
)
## + Fold1: mtry=2
## - Fold1: mtry=2
## + Fold1: mtry=4
## - Fold1: mtry=4
## + Fold1: mtry=6
## - Fold1: mtry=6
## + Fold1: mtry=8
## - Fold1: mtry=8
## + Fold2: mtry=2
## - Fold2: mtry=2
## + Fold2: mtry=4
## - Fold2: mtry=4
## + Fold2: mtry=6
## - Fold2: mtry=6
## + Fold2: mtry=8
## - Fold2: mtry=8
## + Fold3: mtry=2
## - Fold3: mtry=2
## + Fold3: mtry=4
## - Fold3: mtry=4
## + Fold3: mtry=6
## - Fold3: mtry=6
## + Fold3: mtry=8
## - Fold3: mtry=8
## + Fold4: mtry=2
## - Fold4: mtry=2
## + Fold4: mtry=4
## - Fold4: mtry=4
## + Fold4: mtry=6
## - Fold4: mtry=6
## + Fold4: mtry=8
## - Fold4: mtry=8
## + Fold5: mtry=2
## - Fold5: mtry=2
## + Fold5: mtry=4
## - Fold5: mtry=4
## + Fold5: mtry=6
## - Fold5: mtry=6
## + Fold5: mtry=8
## - Fold5: mtry=8
## + Fold6: mtry=2
## - Fold6: mtry=2
## + Fold6: mtry=4
## - Fold6: mtry=4
## + Fold6: mtry=6
## - Fold6: mtry=6
## + Fold6: mtry=8
## - Fold6: mtry=8
## + Fold7: mtry=2
## - Fold7: mtry=2
## + Fold7: mtry=4
## - Fold7: mtry=4
## + Fold7: mtry=6
## - Fold7: mtry=6
## + Fold7: mtry=8
## - Fold7: mtry=8
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 8 on full training set
# Best parameters
best_params <- rf_grid_search$bestTune
print(best_params)
## mtry
## 4 8
# Extract feature importances
best_rf_model <- rf_grid_search$finalModel
feature_importances <- importance(best_rf_model)
# Convert feature importances to a df
feature_importances_df <- as.data.frame(feature_importances)
feature_importances_df$Feature <- rownames(feature_importances_df)
# Sort importances by %IncMSE
sorted_by_inc_mse <- feature_importances_df %>%
arrange(desc(`%IncMSE`))
# Sort importances by IncNodePurity
sorted_by_inc_node_purity <- feature_importances_df %>%
arrange(desc(IncNodePurity))
# Print importances
cat("Feature Importances by %IncMSE:\n")
## Feature Importances by %IncMSE:
print(sorted_by_inc_mse)
## %IncMSE IncNodePurity
## Score_COMP_HIP_KNEE 10.72099517 113.08145609
## Payment_PAYM_90_HIP_KNEE 9.03883574 125.26854000
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 5.27534292 73.57559485
## OP_18b 4.15404171 66.40490249
## StateCode009 3.75181579 12.68932168
## Score_PSI_14 3.68517112 58.90547765
## EDV 3.54546541 29.93993860
## Score_PSI_03 3.46582190 58.03081970
## Score_PSI_10 3.23107560 58.84939459
## Score_PSI_11 3.08855153 62.16132416
## StateCode022 3.01081021 7.67914120
## Score_Ovr_MORT 2.54248526 56.71609772
## Score_PSI_12 2.30684585 57.92670746
## Score_PSI_15 2.20278425 53.99298910
## StateCode017 2.17699288 5.53669955
## Score_PSI_13 2.17475455 62.17694293
## VTE_1 2.12580910 47.62238798
## StateCode027 2.06830695 2.41220726
## StateCode030 2.04843083 4.02894867
## StateCode028 2.01652439 1.00660990
## StateCode040 1.90785092 2.84701791
## StateCode037 1.88293311 0.77661060
## StateCode002 1.66104276 1.30433146
## SAFE_USE_OF_OPIOIDS 1.62970944 48.55372793
## StateCode006 1.59630427 1.87440875
## Score_PSI_09 1.59564021 53.46189072
## Score_PSI_06 1.55882232 39.27187720
## Score_PSI_04 1.35238340 55.18931551
## StateCode014 1.32430609 5.75899367
## IMM_3 1.31862242 43.04444799
## StateCode044 1.25970392 1.55475060
## Score_PSI_08 1.09407851 17.99849530
## StateCode043 1.09398067 4.95968535
## StateCode007 0.91807859 1.34055175
## StateCode016 0.86359756 2.86664974
## OP_29 0.54582706 35.15294543
## StateCode045 0.46701246 0.26882544
## StateCode039 0.42256120 0.15466717
## StateCode029 0.35450222 0.41983563
## StateCode046 0.15190201 2.00772867
## StateCode021 0.06745647 2.12006229
## StateCode008 0.00000000 0.02570528
## HCP_COVID_19 -0.08579886 53.81821089
## StateCode035 -0.19417318 6.68021439
## StateCode042 -0.24927432 1.25474996
## StateCode034 -0.26070754 0.27529986
## StateCode032 -0.28329467 3.32535504
## StateCode018 -0.29430012 2.10533264
## StateCode036 -0.29518214 2.76191318
## StateCode013 -0.30227278 4.74041027
## StateCode003 -0.31581052 0.92501152
## StateCode038 -0.37765964 3.28462255
## StateCode005 -0.42203032 4.13871395
## StateCode047 -0.54564469 1.75726713
## StateCode049 -0.55564975 2.93855287
## StateCode026 -0.63405234 0.50697302
## StateCode023 -0.72584221 3.03005360
## StateCode019 -0.77874702 1.07206757
## StateCode004 -0.80746546 2.00122946
## StateCode025 -0.86678037 3.69146300
## StateCode048 -0.89423959 0.82992245
## StateCode012 -0.89584566 0.87503983
## StateCode024 -0.94335245 0.97347110
## StateCode033 -0.94710201 2.35772816
## StateCode015 -0.96797290 1.03562782
## StateCode050 -1.00503782 0.30357229
## StateCode031 -1.03104209 0.27458034
## StateCode041 -1.05841539 1.08506574
## StateCode010 -1.27820479 3.25993507
## StateCode011 -1.39915516 0.61320416
## StateCode020 -2.82194723 1.65521496
## Feature
## Score_COMP_HIP_KNEE Score_COMP_HIP_KNEE
## Payment_PAYM_90_HIP_KNEE Payment_PAYM_90_HIP_KNEE
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## OP_18b OP_18b
## StateCode009 StateCode009
## Score_PSI_14 Score_PSI_14
## EDV EDV
## Score_PSI_03 Score_PSI_03
## Score_PSI_10 Score_PSI_10
## Score_PSI_11 Score_PSI_11
## StateCode022 StateCode022
## Score_Ovr_MORT Score_Ovr_MORT
## Score_PSI_12 Score_PSI_12
## Score_PSI_15 Score_PSI_15
## StateCode017 StateCode017
## Score_PSI_13 Score_PSI_13
## VTE_1 VTE_1
## StateCode027 StateCode027
## StateCode030 StateCode030
## StateCode028 StateCode028
## StateCode040 StateCode040
## StateCode037 StateCode037
## StateCode002 StateCode002
## SAFE_USE_OF_OPIOIDS SAFE_USE_OF_OPIOIDS
## StateCode006 StateCode006
## Score_PSI_09 Score_PSI_09
## Score_PSI_06 Score_PSI_06
## Score_PSI_04 Score_PSI_04
## StateCode014 StateCode014
## IMM_3 IMM_3
## StateCode044 StateCode044
## Score_PSI_08 Score_PSI_08
## StateCode043 StateCode043
## StateCode007 StateCode007
## StateCode016 StateCode016
## OP_29 OP_29
## StateCode045 StateCode045
## StateCode039 StateCode039
## StateCode029 StateCode029
## StateCode046 StateCode046
## StateCode021 StateCode021
## StateCode008 StateCode008
## HCP_COVID_19 HCP_COVID_19
## StateCode035 StateCode035
## StateCode042 StateCode042
## StateCode034 StateCode034
## StateCode032 StateCode032
## StateCode018 StateCode018
## StateCode036 StateCode036
## StateCode013 StateCode013
## StateCode003 StateCode003
## StateCode038 StateCode038
## StateCode005 StateCode005
## StateCode047 StateCode047
## StateCode049 StateCode049
## StateCode026 StateCode026
## StateCode023 StateCode023
## StateCode019 StateCode019
## StateCode004 StateCode004
## StateCode025 StateCode025
## StateCode048 StateCode048
## StateCode012 StateCode012
## StateCode024 StateCode024
## StateCode033 StateCode033
## StateCode015 StateCode015
## StateCode050 StateCode050
## StateCode031 StateCode031
## StateCode041 StateCode041
## StateCode010 StateCode010
## StateCode011 StateCode011
## StateCode020 StateCode020
cat("\nFeature Importances by IncNodePurity:\n")
##
## Feature Importances by IncNodePurity:
print(sorted_by_inc_node_purity)
## %IncMSE IncNodePurity
## Payment_PAYM_90_HIP_KNEE 9.03883574 125.26854000
## Score_COMP_HIP_KNEE 10.72099517 113.08145609
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 5.27534292 73.57559485
## OP_18b 4.15404171 66.40490249
## Score_PSI_13 2.17475455 62.17694293
## Score_PSI_11 3.08855153 62.16132416
## Score_PSI_14 3.68517112 58.90547765
## Score_PSI_10 3.23107560 58.84939459
## Score_PSI_03 3.46582190 58.03081970
## Score_PSI_12 2.30684585 57.92670746
## Score_Ovr_MORT 2.54248526 56.71609772
## Score_PSI_04 1.35238340 55.18931551
## Score_PSI_15 2.20278425 53.99298910
## HCP_COVID_19 -0.08579886 53.81821089
## Score_PSI_09 1.59564021 53.46189072
## SAFE_USE_OF_OPIOIDS 1.62970944 48.55372793
## VTE_1 2.12580910 47.62238798
## IMM_3 1.31862242 43.04444799
## Score_PSI_06 1.55882232 39.27187720
## OP_29 0.54582706 35.15294543
## EDV 3.54546541 29.93993860
## Score_PSI_08 1.09407851 17.99849530
## StateCode009 3.75181579 12.68932168
## StateCode022 3.01081021 7.67914120
## StateCode035 -0.19417318 6.68021439
## StateCode014 1.32430609 5.75899367
## StateCode017 2.17699288 5.53669955
## StateCode043 1.09398067 4.95968535
## StateCode013 -0.30227278 4.74041027
## StateCode005 -0.42203032 4.13871395
## StateCode030 2.04843083 4.02894867
## StateCode025 -0.86678037 3.69146300
## StateCode032 -0.28329467 3.32535504
## StateCode038 -0.37765964 3.28462255
## StateCode010 -1.27820479 3.25993507
## StateCode023 -0.72584221 3.03005360
## StateCode049 -0.55564975 2.93855287
## StateCode016 0.86359756 2.86664974
## StateCode040 1.90785092 2.84701791
## StateCode036 -0.29518214 2.76191318
## StateCode027 2.06830695 2.41220726
## StateCode033 -0.94710201 2.35772816
## StateCode021 0.06745647 2.12006229
## StateCode018 -0.29430012 2.10533264
## StateCode046 0.15190201 2.00772867
## StateCode004 -0.80746546 2.00122946
## StateCode006 1.59630427 1.87440875
## StateCode047 -0.54564469 1.75726713
## StateCode020 -2.82194723 1.65521496
## StateCode044 1.25970392 1.55475060
## StateCode007 0.91807859 1.34055175
## StateCode002 1.66104276 1.30433146
## StateCode042 -0.24927432 1.25474996
## StateCode041 -1.05841539 1.08506574
## StateCode019 -0.77874702 1.07206757
## StateCode015 -0.96797290 1.03562782
## StateCode028 2.01652439 1.00660990
## StateCode024 -0.94335245 0.97347110
## StateCode003 -0.31581052 0.92501152
## StateCode012 -0.89584566 0.87503983
## StateCode048 -0.89423959 0.82992245
## StateCode037 1.88293311 0.77661060
## StateCode011 -1.39915516 0.61320416
## StateCode026 -0.63405234 0.50697302
## StateCode029 0.35450222 0.41983563
## StateCode050 -1.00503782 0.30357229
## StateCode034 -0.26070754 0.27529986
## StateCode031 -1.03104209 0.27458034
## StateCode045 0.46701246 0.26882544
## StateCode039 0.42256120 0.15466717
## StateCode008 0.00000000 0.02570528
## Feature
## Payment_PAYM_90_HIP_KNEE Payment_PAYM_90_HIP_KNEE
## Score_COMP_HIP_KNEE Score_COMP_HIP_KNEE
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## OP_18b OP_18b
## Score_PSI_13 Score_PSI_13
## Score_PSI_11 Score_PSI_11
## Score_PSI_14 Score_PSI_14
## Score_PSI_10 Score_PSI_10
## Score_PSI_03 Score_PSI_03
## Score_PSI_12 Score_PSI_12
## Score_Ovr_MORT Score_Ovr_MORT
## Score_PSI_04 Score_PSI_04
## Score_PSI_15 Score_PSI_15
## HCP_COVID_19 HCP_COVID_19
## Score_PSI_09 Score_PSI_09
## SAFE_USE_OF_OPIOIDS SAFE_USE_OF_OPIOIDS
## VTE_1 VTE_1
## IMM_3 IMM_3
## Score_PSI_06 Score_PSI_06
## OP_29 OP_29
## EDV EDV
## Score_PSI_08 Score_PSI_08
## StateCode009 StateCode009
## StateCode022 StateCode022
## StateCode035 StateCode035
## StateCode014 StateCode014
## StateCode017 StateCode017
## StateCode043 StateCode043
## StateCode013 StateCode013
## StateCode005 StateCode005
## StateCode030 StateCode030
## StateCode025 StateCode025
## StateCode032 StateCode032
## StateCode038 StateCode038
## StateCode010 StateCode010
## StateCode023 StateCode023
## StateCode049 StateCode049
## StateCode016 StateCode016
## StateCode040 StateCode040
## StateCode036 StateCode036
## StateCode027 StateCode027
## StateCode033 StateCode033
## StateCode021 StateCode021
## StateCode018 StateCode018
## StateCode046 StateCode046
## StateCode004 StateCode004
## StateCode006 StateCode006
## StateCode047 StateCode047
## StateCode020 StateCode020
## StateCode044 StateCode044
## StateCode007 StateCode007
## StateCode002 StateCode002
## StateCode042 StateCode042
## StateCode041 StateCode041
## StateCode019 StateCode019
## StateCode015 StateCode015
## StateCode028 StateCode028
## StateCode024 StateCode024
## StateCode003 StateCode003
## StateCode012 StateCode012
## StateCode048 StateCode048
## StateCode037 StateCode037
## StateCode011 StateCode011
## StateCode026 StateCode026
## StateCode029 StateCode029
## StateCode050 StateCode050
## StateCode034 StateCode034
## StateCode031 StateCode031
## StateCode045 StateCode045
## StateCode039 StateCode039
## StateCode008 StateCode008
# Remove columns from the test set to match train set
HipKneeTest_RF <- HipKneeTest %>%
select(-State, -FacilityName, -FacilityId)
# Make predictions on test set
rf_predictions <- predict(rf_grid_search, newdata = HipKneeTest_RF)
# Actual values
actual_values <- HipKneeTest$PredictedReadmissionRate_HIP_KNEE
# Calculate RMSE
mse <- mean((rf_predictions - actual_values)^2)
rmse <- sqrt(mse)
# Calculate R-squared
ss_total <- sum((actual_values - mean(actual_values))^2)
ss_residual <- sum((rf_predictions - actual_values)^2)
r_squared <- 1 - (ss_residual / ss_total)
# Print RMSE and R-squared
cat("RMSE on test set:\n")
## RMSE on test set:
print(rmse)
## [1] 0.3616222
cat("\nR-squared on test set:\n")
##
## R-squared on test set:
print(r_squared)
## [1] 0.8417858
# Calculate residuals
residuals_rf <- actual_values - rf_predictions
# Residuals vs Fitted Values plot
ggplot(data = NULL, aes(x = rf_predictions, y = residuals_rf)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "loess", se = FALSE, color = "blue") +
labs(title = "Residuals vs Fitted Values",
x = "Fitted Values",
y = "Residuals") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Histogram of residuals
ggplot(data = NULL, aes(x = residuals_rf)) +
geom_histogram(binwidth = 0.1, fill = "blue", alpha = 0.7, boundary = 0) +
labs(title = "Histogram of Residuals",
x = "Residuals",
y = "Frequency") +
theme_minimal()
# QQ plot of residuals
qqnorm(residuals_rf, main = "QQ Plot of Residuals")
qqline(residuals_rf, col = "red")
# Perform Durbin-Watson test for autocorrelation in residuals
dw_test_result <- dwtest(lm(residuals_rf ~ rf_predictions))
print(dw_test_result)
##
## Durbin-Watson test
##
## data: lm(residuals_rf ~ rf_predictions)
## DW = 1.8049, p-value = 1.44e-05
## alternative hypothesis: true autocorrelation is greater than 0
# Separate predictors and response variable in the training set
x_train <- as.matrix(HipKneeTrain %>% select(-c(State, FacilityName, PredictedReadmissionRate_HIP_KNEE)))
y_train <- HipKneeTrain$PredictedReadmissionRate_HIP_KNEE
# Separate predictors and response variable in the test set
x_test <- as.matrix(HipKneeTest %>% select(-c(State, FacilityName, PredictedReadmissionRate_HIP_KNEE)))
y_test <- HipKneeTest$PredictedReadmissionRate_HIP_KNEE
# Define the grid of hyperparameters
searchGrid <- expand.grid(.alpha = seq(0, 1, length.out = 10),
.lambda = seq(0, 5, length.out = 15))
# Define the train control
ctrl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
search = "grid",
verboseIter = FALSE)
# Set up cross-validation
elasticnet_model <- train(
x = x_train,
y = y_train,
method = "glmnet",
trControl = ctrl,
tuneGrid = searchGrid
)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
# Best hyperparameters
best_alpha <- elasticnet_model$bestTune$alpha
best_lambda <- elasticnet_model$bestTune$lambda
# Print best alpha and lambda
print(paste("Best Alpha: ", best_alpha))
## [1] "Best Alpha: 0"
print(paste("Best Lambda: ", best_lambda))
## [1] "Best Lambda: 0"
# Make predictions on the test set
predictions <- predict(elasticnet_model, newdata = x_test)
# Calculate RMSE
rmse <- sqrt(mean((predictions - y_test)^2))
# Print RMSE
print(paste("RMSE on Test Set: ", rmse))
## [1] "RMSE on Test Set: 0.799586716495415"
# Calculate performance metrics on the test set
performance <- postResample(pred = predictions, obs = y_test)
# Extract and print R-squared
r_squared <- performance["Rsquared"]
print(paste("R^2 on Test Set: ", r_squared))
## [1] "R^2 on Test Set: 0.226601204078256"
# Get the feature importance
important <- varImp(elasticnet_model)$importance
# View the feature importance
important %>%
mutate(Feature = rownames(important)) %>%
mutate(Feature = gsub("\\.", " ", Feature)) %>%
arrange(desc(Overall)) %>%
ggplot(aes(y = Overall, fill = Overall, x = fct_reorder(Feature, Overall))) +
geom_col() +
scale_fill_continuous(low = "lightblue", high = "darkblue") +
coord_flip() +
theme_minimal() +
labs(title = "Feature importance as determined by Elastic Net",
x = "",
y = "Importance",
fill = "")
# Convert x_train and x_test back to data frames
x_train <- as.data.frame(x_train)
x_test <- as.data.frame(x_test)
# Ensure all columns are numeric
x_train[] <- lapply(x_train, as.numeric)
x_test[] <- lapply(x_test, as.numeric)
# Convert y_train to a data frame
train_data <- cbind(y_train = y_train, x_train)
# Define the grid for kernel types
searchGrid_kernel <- expand.grid(
.kernel = c("linear", "polynomial", "radial", "sigmoid")
)
# Train the SVM model with kernel tuning
svm_tune_kernel <- tune(svm,
y_train ~ .,
data = train_data,
ranges = searchGrid_kernel,
tunecontrol = tune.control(
sampling = "cross",
cross = 10
)
)
# Extract the best kernel
best_kernel <- svm_tune_kernel$best.model$kernel
if (best_kernel == 0) {
kernel_description <- "Linear kernel"
} else if (best_kernel == 1) {
kernel_description <- "Polynomial kernel"
} else if (best_kernel == 2) {
kernel_description <- "Radial kernel"
} else if (best_kernel == 3) {
kernel_description <- "Sigmoid kernel"
} else {
kernel_description <- "Unknown kernel"
}
cat("Best Kernel Description:", kernel_description, "\n")
## Best Kernel Description: Radial kernel
# Define the grid for gamma
searchGrid_gamma <- expand.grid(
gamma = c(0.01, 0.1, 1)
)
# Train the SVM model with gamma tuning
svm_tune_gamma <- tune(svm,
y_train ~ .,
data = train_data,
ranges = searchGrid_gamma,
kernel = "radial",
tunecontrol = tune.control(
sampling = "cross",
cross = 10
)
)
# Extract the best gamma
best_gamma <- svm_tune_gamma$best.model$gamma
cat("Best Gamma:", best_gamma, "\n")
## Best Gamma: 0.01
# Define the grid for cost
searchGrid_cost <- expand.grid(
C = c(0.1, 1, 10)
)
# Train the SVM model with cost tuning
svm_tune_cost <- tune(svm,
y_train ~ .,
data = train_data,
ranges = searchGrid_cost,
kernel = "radial",
tunecontrol = tune.control(
sampling = "cross",
cross = 10
)
)
# Extract the best cost
best_cost <- svm_tune_cost$best.model$cost
cat("Best Cost:", best_cost, "\n")
## Best Cost: 1
# Final model with best parameters
svm_final <- svm(y_train ~ .,
data = train_data,
kernel = "radial",
C = 1,
gamma = 0.01,
probability = TRUE)
# Make predictions on the test set
predictions <- predict(svm_final, x_test, type = "response")
# Calculate RMSE
rmse <- sqrt(mean((predictions - y_test)^2))
cat("RMSE on Test Set:", rmse, "\n")
## RMSE on Test Set: 0.7568615
# Calculate R-squared
rss <- sum((y_test - predictions)^2)
tss <- sum((y_test - mean(y_test))^2)
r_squared <- 1 - (rss / tss)
cat("R-squared on Test Set:", r_squared, "\n")
## R-squared on Test Set: 0.3069442
# Check for median
print(median(HipKneeTrain$PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE))
## [1] 4.4769
print(median(HipKneeTest$PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE))
## [1] 4.4769
# Calculate the median of the target variable from the training data
median_value <- median(HipKneeTrain$PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE)
# Categorize the target variable in the training data
HipKneeTrain_Qual <- HipKneeTrain %>%
mutate(TargetCategory = ifelse(PredictedReadmissionRate_HIP_KNEE > median_value, 1, 0))
# Categorize the target variable in the testing data using the median from the training data
HipKneeTest_Qual <- HipKneeTest %>%
mutate(TargetCategory = ifelse(PredictedReadmissionRate_HIP_KNEE > median_value, 1, 0))
# Remove unwanted columns from the dataset
HipKneeTrain_QualRF <- HipKneeTrain_Qual %>%
select(-State, -FacilityName, -FacilityId)
# Define mtry parameter grid
grid <- expand.grid(
mtry = c(2, 4, 6, 8)
)
# Define CV
train_control <- trainControl(
method = "cv",
number = 7,
verboseIter = TRUE
)
# Train the Random Forest model with grid search
rf_grid_search_qual <- train(
PredictedReadmissionRate_HIP_KNEE ~ .,
data = HipKneeTrain_QualRF,
method = "rf",
trControl = train_control,
tuneGrid = grid,
importance = TRUE,
ntree = 100
)
## + Fold1: mtry=2
## - Fold1: mtry=2
## + Fold1: mtry=4
## - Fold1: mtry=4
## + Fold1: mtry=6
## - Fold1: mtry=6
## + Fold1: mtry=8
## - Fold1: mtry=8
## + Fold2: mtry=2
## - Fold2: mtry=2
## + Fold2: mtry=4
## - Fold2: mtry=4
## + Fold2: mtry=6
## - Fold2: mtry=6
## + Fold2: mtry=8
## - Fold2: mtry=8
## + Fold3: mtry=2
## - Fold3: mtry=2
## + Fold3: mtry=4
## - Fold3: mtry=4
## + Fold3: mtry=6
## - Fold3: mtry=6
## + Fold3: mtry=8
## - Fold3: mtry=8
## + Fold4: mtry=2
## - Fold4: mtry=2
## + Fold4: mtry=4
## - Fold4: mtry=4
## + Fold4: mtry=6
## - Fold4: mtry=6
## + Fold4: mtry=8
## - Fold4: mtry=8
## + Fold5: mtry=2
## - Fold5: mtry=2
## + Fold5: mtry=4
## - Fold5: mtry=4
## + Fold5: mtry=6
## - Fold5: mtry=6
## + Fold5: mtry=8
## - Fold5: mtry=8
## + Fold6: mtry=2
## - Fold6: mtry=2
## + Fold6: mtry=4
## - Fold6: mtry=4
## + Fold6: mtry=6
## - Fold6: mtry=6
## + Fold6: mtry=8
## - Fold6: mtry=8
## + Fold7: mtry=2
## - Fold7: mtry=2
## + Fold7: mtry=4
## - Fold7: mtry=4
## + Fold7: mtry=6
## - Fold7: mtry=6
## + Fold7: mtry=8
## - Fold7: mtry=8
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 8 on full training set
# Best parameters
best_params_qual <- rf_grid_search_qual$bestTune
print(best_params_qual)
## mtry
## 4 8
# Extract feature importances
best_rf_model_qual <- rf_grid_search_qual$finalModel
feature_importances_qual <- importance(best_rf_model_qual)
# Convert feature importances to a df
feature_importances_df_qual <- as.data.frame(feature_importances_qual)
feature_importances_df_qual$Feature <- rownames(feature_importances_df_qual)
# Sort importances by %IncMSE
sorted_by_inc_mse_qual <- feature_importances_df_qual %>%
arrange(desc(`%IncMSE`))
# Sort importances by IncNodePurity
sorted_by_inc_node_purity_qual <- feature_importances_df_qual %>%
arrange(desc(IncNodePurity))
# Print importances
cat("Feature Importances by %IncMSE:\n")
## Feature Importances by %IncMSE:
print(sorted_by_inc_mse_qual)
## %IncMSE IncNodePurity
## TargetCategory 38.454609123 578.63382604
## Score_COMP_HIP_KNEE 6.413666141 67.48995812
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 4.622333459 50.95816204
## Payment_PAYM_90_HIP_KNEE 3.722806364 83.06700071
## SAFE_USE_OF_OPIOIDS 2.852048321 29.35317182
## Score_PSI_14 2.653725325 36.72781958
## VTE_1 2.512239008 29.78967520
## Score_PSI_03 2.128773781 31.74306556
## StateCode032 2.113116805 2.16227939
## StateCode021 1.937405988 1.25366628
## HCP_COVID_19 1.821532099 32.99200802
## StateCode009 1.778107174 6.74711997
## Score_PSI_13 1.763653930 41.22524664
## StateCode025 1.553549142 1.95282948
## Score_PSI_11 1.416802274 36.58100366
## EDV 1.403857398 18.20485109
## OP_18b 1.368865442 37.00636601
## StateCode022 1.352517555 4.88489896
## StateCode023 1.343645070 1.54623315
## StateCode006 1.308868129 1.22090515
## StateCode024 1.299575379 0.55496085
## Score_PSI_10 1.250005381 36.74989673
## StateCode028 1.230071836 0.49145597
## StateCode031 1.188907608 0.22355659
## StateCode048 1.157795409 0.39672396
## StateCode045 1.139693506 0.12850442
## Score_PSI_15 1.093584999 34.02149110
## StateCode037 1.052839978 0.34551259
## StateCode041 1.034193570 1.33429071
## StateCode017 0.971593965 2.43103308
## StateCode019 0.958972320 0.49065207
## Score_PSI_06 0.939302779 26.03873553
## Score_Ovr_MORT 0.932083058 35.69710847
## StateCode029 0.890420352 0.10667531
## StateCode047 0.822092601 0.98792929
## StateCode050 0.811717605 0.34359804
## Score_PSI_09 0.802014318 33.40547954
## Score_PSI_08 0.790888242 10.34977716
## StateCode018 0.700728640 1.09975845
## StateCode016 0.630153099 1.94136860
## IMM_3 0.532749164 25.96453714
## StateCode014 0.456820482 3.07326932
## StateCode030 0.275507316 3.94106466
## StateCode010 0.155168582 3.01003912
## StateCode005 0.101984414 2.24271140
## StateCode027 0.050054748 1.61240936
## Score_PSI_04 0.017521156 33.47246809
## StateCode026 0.010413998 0.31455293
## StateCode013 0.009620924 3.26198045
## StateCode008 0.000000000 0.05795786
## StateCode034 0.000000000 0.09302397
## Score_PSI_12 -0.065736075 39.11757199
## StateCode020 -0.094778269 1.05190932
## StateCode033 -0.104216661 1.26938981
## StateCode036 -0.254442553 1.73979288
## StateCode002 -0.259537322 0.89573831
## StateCode049 -0.264268724 1.76101886
## StateCode004 -0.312852304 0.95762855
## StateCode044 -0.452192444 0.69195191
## StateCode003 -0.516388223 0.72633260
## StateCode042 -0.554598913 0.91413875
## StateCode011 -0.582554187 0.21347918
## StateCode007 -0.797443408 0.84713126
## StateCode039 -1.005037815 0.14125555
## StateCode012 -1.299946492 0.35546830
## StateCode040 -1.306245033 1.59221626
## StateCode038 -1.431647141 1.87988082
## StateCode015 -1.447129987 0.69092473
## StateCode046 -1.460738686 1.14240479
## StateCode035 -1.493860931 3.56921597
## OP_29 -1.987606026 22.85985652
## StateCode043 -2.022322527 2.65139779
## Feature
## TargetCategory TargetCategory
## Score_COMP_HIP_KNEE Score_COMP_HIP_KNEE
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## Payment_PAYM_90_HIP_KNEE Payment_PAYM_90_HIP_KNEE
## SAFE_USE_OF_OPIOIDS SAFE_USE_OF_OPIOIDS
## Score_PSI_14 Score_PSI_14
## VTE_1 VTE_1
## Score_PSI_03 Score_PSI_03
## StateCode032 StateCode032
## StateCode021 StateCode021
## HCP_COVID_19 HCP_COVID_19
## StateCode009 StateCode009
## Score_PSI_13 Score_PSI_13
## StateCode025 StateCode025
## Score_PSI_11 Score_PSI_11
## EDV EDV
## OP_18b OP_18b
## StateCode022 StateCode022
## StateCode023 StateCode023
## StateCode006 StateCode006
## StateCode024 StateCode024
## Score_PSI_10 Score_PSI_10
## StateCode028 StateCode028
## StateCode031 StateCode031
## StateCode048 StateCode048
## StateCode045 StateCode045
## Score_PSI_15 Score_PSI_15
## StateCode037 StateCode037
## StateCode041 StateCode041
## StateCode017 StateCode017
## StateCode019 StateCode019
## Score_PSI_06 Score_PSI_06
## Score_Ovr_MORT Score_Ovr_MORT
## StateCode029 StateCode029
## StateCode047 StateCode047
## StateCode050 StateCode050
## Score_PSI_09 Score_PSI_09
## Score_PSI_08 Score_PSI_08
## StateCode018 StateCode018
## StateCode016 StateCode016
## IMM_3 IMM_3
## StateCode014 StateCode014
## StateCode030 StateCode030
## StateCode010 StateCode010
## StateCode005 StateCode005
## StateCode027 StateCode027
## Score_PSI_04 Score_PSI_04
## StateCode026 StateCode026
## StateCode013 StateCode013
## StateCode008 StateCode008
## StateCode034 StateCode034
## Score_PSI_12 Score_PSI_12
## StateCode020 StateCode020
## StateCode033 StateCode033
## StateCode036 StateCode036
## StateCode002 StateCode002
## StateCode049 StateCode049
## StateCode004 StateCode004
## StateCode044 StateCode044
## StateCode003 StateCode003
## StateCode042 StateCode042
## StateCode011 StateCode011
## StateCode007 StateCode007
## StateCode039 StateCode039
## StateCode012 StateCode012
## StateCode040 StateCode040
## StateCode038 StateCode038
## StateCode015 StateCode015
## StateCode046 StateCode046
## StateCode035 StateCode035
## OP_29 OP_29
## StateCode043 StateCode043
cat("\nFeature Importances by IncNodePurity:\n")
##
## Feature Importances by IncNodePurity:
print(sorted_by_inc_node_purity_qual)
## %IncMSE IncNodePurity
## TargetCategory 38.454609123 578.63382604
## Payment_PAYM_90_HIP_KNEE 3.722806364 83.06700071
## Score_COMP_HIP_KNEE 6.413666141 67.48995812
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 4.622333459 50.95816204
## Score_PSI_13 1.763653930 41.22524664
## Score_PSI_12 -0.065736075 39.11757199
## OP_18b 1.368865442 37.00636601
## Score_PSI_10 1.250005381 36.74989673
## Score_PSI_14 2.653725325 36.72781958
## Score_PSI_11 1.416802274 36.58100366
## Score_Ovr_MORT 0.932083058 35.69710847
## Score_PSI_15 1.093584999 34.02149110
## Score_PSI_04 0.017521156 33.47246809
## Score_PSI_09 0.802014318 33.40547954
## HCP_COVID_19 1.821532099 32.99200802
## Score_PSI_03 2.128773781 31.74306556
## VTE_1 2.512239008 29.78967520
## SAFE_USE_OF_OPIOIDS 2.852048321 29.35317182
## Score_PSI_06 0.939302779 26.03873553
## IMM_3 0.532749164 25.96453714
## OP_29 -1.987606026 22.85985652
## EDV 1.403857398 18.20485109
## Score_PSI_08 0.790888242 10.34977716
## StateCode009 1.778107174 6.74711997
## StateCode022 1.352517555 4.88489896
## StateCode030 0.275507316 3.94106466
## StateCode035 -1.493860931 3.56921597
## StateCode013 0.009620924 3.26198045
## StateCode014 0.456820482 3.07326932
## StateCode010 0.155168582 3.01003912
## StateCode043 -2.022322527 2.65139779
## StateCode017 0.971593965 2.43103308
## StateCode005 0.101984414 2.24271140
## StateCode032 2.113116805 2.16227939
## StateCode025 1.553549142 1.95282948
## StateCode016 0.630153099 1.94136860
## StateCode038 -1.431647141 1.87988082
## StateCode049 -0.264268724 1.76101886
## StateCode036 -0.254442553 1.73979288
## StateCode027 0.050054748 1.61240936
## StateCode040 -1.306245033 1.59221626
## StateCode023 1.343645070 1.54623315
## StateCode041 1.034193570 1.33429071
## StateCode033 -0.104216661 1.26938981
## StateCode021 1.937405988 1.25366628
## StateCode006 1.308868129 1.22090515
## StateCode046 -1.460738686 1.14240479
## StateCode018 0.700728640 1.09975845
## StateCode020 -0.094778269 1.05190932
## StateCode047 0.822092601 0.98792929
## StateCode004 -0.312852304 0.95762855
## StateCode042 -0.554598913 0.91413875
## StateCode002 -0.259537322 0.89573831
## StateCode007 -0.797443408 0.84713126
## StateCode003 -0.516388223 0.72633260
## StateCode044 -0.452192444 0.69195191
## StateCode015 -1.447129987 0.69092473
## StateCode024 1.299575379 0.55496085
## StateCode028 1.230071836 0.49145597
## StateCode019 0.958972320 0.49065207
## StateCode048 1.157795409 0.39672396
## StateCode012 -1.299946492 0.35546830
## StateCode037 1.052839978 0.34551259
## StateCode050 0.811717605 0.34359804
## StateCode026 0.010413998 0.31455293
## StateCode031 1.188907608 0.22355659
## StateCode011 -0.582554187 0.21347918
## StateCode039 -1.005037815 0.14125555
## StateCode045 1.139693506 0.12850442
## StateCode029 0.890420352 0.10667531
## StateCode034 0.000000000 0.09302397
## StateCode008 0.000000000 0.05795786
## Feature
## TargetCategory TargetCategory
## Payment_PAYM_90_HIP_KNEE Payment_PAYM_90_HIP_KNEE
## Score_COMP_HIP_KNEE Score_COMP_HIP_KNEE
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## Score_PSI_13 Score_PSI_13
## Score_PSI_12 Score_PSI_12
## OP_18b OP_18b
## Score_PSI_10 Score_PSI_10
## Score_PSI_14 Score_PSI_14
## Score_PSI_11 Score_PSI_11
## Score_Ovr_MORT Score_Ovr_MORT
## Score_PSI_15 Score_PSI_15
## Score_PSI_04 Score_PSI_04
## Score_PSI_09 Score_PSI_09
## HCP_COVID_19 HCP_COVID_19
## Score_PSI_03 Score_PSI_03
## VTE_1 VTE_1
## SAFE_USE_OF_OPIOIDS SAFE_USE_OF_OPIOIDS
## Score_PSI_06 Score_PSI_06
## IMM_3 IMM_3
## OP_29 OP_29
## EDV EDV
## Score_PSI_08 Score_PSI_08
## StateCode009 StateCode009
## StateCode022 StateCode022
## StateCode030 StateCode030
## StateCode035 StateCode035
## StateCode013 StateCode013
## StateCode014 StateCode014
## StateCode010 StateCode010
## StateCode043 StateCode043
## StateCode017 StateCode017
## StateCode005 StateCode005
## StateCode032 StateCode032
## StateCode025 StateCode025
## StateCode016 StateCode016
## StateCode038 StateCode038
## StateCode049 StateCode049
## StateCode036 StateCode036
## StateCode027 StateCode027
## StateCode040 StateCode040
## StateCode023 StateCode023
## StateCode041 StateCode041
## StateCode033 StateCode033
## StateCode021 StateCode021
## StateCode006 StateCode006
## StateCode046 StateCode046
## StateCode018 StateCode018
## StateCode020 StateCode020
## StateCode047 StateCode047
## StateCode004 StateCode004
## StateCode042 StateCode042
## StateCode002 StateCode002
## StateCode007 StateCode007
## StateCode003 StateCode003
## StateCode044 StateCode044
## StateCode015 StateCode015
## StateCode024 StateCode024
## StateCode028 StateCode028
## StateCode019 StateCode019
## StateCode048 StateCode048
## StateCode012 StateCode012
## StateCode037 StateCode037
## StateCode050 StateCode050
## StateCode026 StateCode026
## StateCode031 StateCode031
## StateCode011 StateCode011
## StateCode039 StateCode039
## StateCode045 StateCode045
## StateCode029 StateCode029
## StateCode034 StateCode034
## StateCode008 StateCode008
# Remove columns from the test set to match train set
HipKneeTest_QualRF <- HipKneeTest_Qual %>%
select(-State, -FacilityName, -FacilityId)
# Make predictions on test set
rf_predictions_qual <- predict(rf_grid_search_qual, newdata = HipKneeTest_QualRF)
# Actual values
actual_values_qual <- HipKneeTest_Qual$PredictedReadmissionRate_HIP_KNEE
# Calculate RMSE
mse_qual <- mean((rf_predictions_qual - actual_values_qual)^2)
rmse_qual <- sqrt(mse_qual)
# Calculate R-squared
ss_total_qual <- sum((actual_values_qual - mean(actual_values_qual))^2)
ss_residual_qual <- sum((rf_predictions_qual - actual_values_qual)^2)
r_squared_qual <- 1 - (ss_residual_qual / ss_total_qual)
# Print RMSE and R-squared
cat("RMSE on test set:\n")
## RMSE on test set:
print(rmse_qual)
## [1] 0.25337
cat("\nR-squared on test set:\n")
##
## R-squared on test set:
print(r_squared_qual)
## [1] 0.9223314
# Separate predictors and response variable in the training set
x_train_qual <- as.matrix(HipKneeTrain_Qual %>% select(-c(State, FacilityName, PredictedReadmissionRate_HIP_KNEE)))
y_train_qual <- HipKneeTrain_Qual$PredictedReadmissionRate_HIP_KNEE
# Separate predictors and response variable in the test set
x_test_qual <- as.matrix(HipKneeTest_Qual %>% select(-c(State, FacilityName, PredictedReadmissionRate_HIP_KNEE)))
y_test_qual <- HipKneeTest_Qual$PredictedReadmissionRate_HIP_KNEE
# Define the grid of hyperparameters
searchGrid <- expand.grid(.alpha = seq(0, 1, length.out = 10),
.lambda = seq(0, 5, length.out = 15))
# Define the train control
ctrl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
search = "grid",
verboseIter = FALSE)
# Set up cross-validation
elasticnet_model_qual <- train(
x = x_train_qual,
y = y_train_qual,
method = "glmnet",
trControl = ctrl,
tuneGrid = searchGrid
)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
# Best hyperparameters
best_alpha_qual <- elasticnet_model_qual$bestTune$alpha
best_lambda_qual <- elasticnet_model_qual$bestTune$lambda
# Print best alpha and lambda
print(paste("Best Alpha: ", best_alpha_qual))
## [1] "Best Alpha: 1"
print(paste("Best Lambda: ", best_lambda_qual))
## [1] "Best Lambda: 0"
# Make predictions on the test set
predictions_qual <- predict(elasticnet_model_qual, newdata = x_test_qual)
# Calculate RMSE
rmse_qual <- sqrt(mean((predictions_qual - y_test_qual)^2))
# Print RMSE
print(paste("RMSE on Test Set: ", rmse_qual))
## [1] "RMSE on Test Set: 0.53153841567739"
# Calculate performance metrics on the test set
performance_qual <- postResample(pred = predictions_qual, obs = y_test_qual)
# Extract and print R-squared
r_squared_qual <- performance_qual["Rsquared"]
print(paste("R^2 on Test Set: ", r_squared_qual))
## [1] "R^2 on Test Set: 0.658177919678784"
# Get the feature importance
important_qual <- varImp(elasticnet_model_qual)$importance
# View the feature importance
important_qual %>%
mutate(Feature = rownames(important_qual)) %>%
mutate(Feature = gsub("\\.", " ", Feature)) %>%
arrange(desc(Overall)) %>%
ggplot(aes(y = Overall, fill = Overall, x = fct_reorder(Feature, Overall))) +
geom_col() +
scale_fill_continuous(low = "lightblue", high = "darkblue") +
coord_flip() +
theme_minimal() +
labs(title = "Feature importance as determined by Elastic Net",
x = "",
y = "Importance",
fill = "")
# Convert x_train and x_test back to data frames
x_train_qual <- as.data.frame(x_train_qual)
x_test_qual <- as.data.frame(x_test_qual)
# Ensure all columns are numeric
x_train_qual[] <- lapply(x_train_qual, as.numeric)
x_test_qual[] <- lapply(x_test_qual, as.numeric)
# Convert y_train to a data frame
train_data_qual <- cbind(y_train_qual = y_train_qual, x_train_qual)
# Define the grid for kernel types
searchGrid_kernel <- expand.grid(
.kernel = c("linear", "polynomial", "radial", "sigmoid")
)
# Train the SVM model with kernel tuning
svm_tune_kernel_qual <- tune(svm,
y_train_qual ~ .,
data = train_data_qual,
ranges = searchGrid_kernel,
tunecontrol = tune.control(
sampling = "cross",
cross = 10
)
)
# Extract the best kernel
best_kernel_qual <- svm_tune_kernel_qual$best.model$kernel
if (best_kernel_qual == 0) {
kernel_description_qual <- "Linear kernel"
} else if (best_kernel_qual == 1) {
kernel_description_qual <- "Polynomial kernel"
} else if (best_kernel_qual == 2) {
kernel_description_qual <- "Radial kernel"
} else if (best_kernel_qual == 3) {
kernel_description_qual <- "Sigmoid kernel"
} else {
kernel_description_qual <- "Unknown kernel"
}
cat("Best Kernel Description:", kernel_description_qual, "\n")
## Best Kernel Description: Radial kernel
# Define the grid for gamma
searchGrid_gamma <- expand.grid(
gamma = c(0.01, 0.1, 1)
)
# Train the SVM model with gamma tuning
svm_tune_gamma_qual <- tune(svm,
y_train_qual ~ .,
data = train_data_qual,
ranges = searchGrid_gamma,
kernel = "radial",
tunecontrol = tune.control(
sampling = "cross",
cross = 10
)
)
# Extract the best gamma
best_gamma_qual <- svm_tune_gamma_qual$best.model$gamma
cat("Best Gamma:", best_gamma_qual, "\n")
## Best Gamma: 0.01
# Define the grid for cost
searchGrid_cost <- expand.grid(
C = c(0.1, 1, 10)
)
# Train the SVM model with cost tuning
svm_tune_cost_qual <- tune(svm,
y_train_qual ~ .,
data = train_data_qual,
ranges = searchGrid_cost,
kernel = "radial",
tunecontrol = tune.control(
sampling = "cross",
cross = 10
)
)
# Extract the best cost
best_cost_qual <- svm_tune_cost_qual$best.model$cost
cat("Best Cost:", best_cost_qual, "\n")
## Best Cost: 1
# Final model with best parameters
svm_final_qual <- svm(y_train_qual ~ .,
data = train_data_qual,
kernel = "radial",
C = 1,
gamma = 0.01,
probability = TRUE)
# Make predictions on the test set
predictions_qual <- predict(svm_final_qual, x_test_qual, type = "response")
# Calculate RMSE
rmse_qual <- sqrt(mean((predictions_qual - y_test_qual)^2))
cat("RMSE on Test Set:", rmse_qual, "\n")
## RMSE on Test Set: 0.5135517
# Calculate R-squared
rss_qual <- sum((y_test_qual - predictions_qual)^2)
tss_qual <- sum((y_test_qual - mean(y_test_qual))^2)
r_squared_qual <- 1 - (rss_qual / tss_qual)
cat("R-squared on Test Set:", r_squared_qual, "\n")
## R-squared on Test Set: 0.6809169